The performance of Primary Care Trusts in England is assessed and published using a number of different performance indicators. Our study has two broad purposes. Firstly, to find out whether pairs of indicators that purport to measure similar aspects of quality are correlated (as would be expected if they are both valid measures of the same construct). Secondly, we wanted to find out whether broad (global) indicators correlated with any particular features of Primary Care Trusts, such as expenditure per capita.
Cross sectional quantitative analysis using data from six 2004/05 PCT performance indicators for 303 English Primary Care Trusts from four sources in the public domain: Star Rating, aggregated Quality and Outcomes Framework scores, Dr Foster mortality index, Dr Foster equity index (heart by-pass and hip replacements), NHS Litigation Authority Risk Management standards and Patient Satisfaction scores from the Star Ratings. Forward stepwise multiple regression analysis to determine the effect of Primary Care Trust characteristics on performance.
Star Rating and Quality and Outcomes Framework total, both summary measures of global quality, were not correlated with each other (F = 0.66, p = 0.57). There were however positive correlations between Quality and Outcomes Framework total and patient satisfaction (r = 0.61, p < 0.001) and between screening/'additional services' indicators on the Star Ratings and Quality and Outcomes Framework (F = 24, p < 0.001). There was no correlation between different measures of access to services. Likewise we found no relationship between either Star Rating or Litigation Authority Standards and hospital mortality (F = 0.61, p = 0.61; F = 0.31, p = 0.73).
Performance assessment in healthcare remains on the Government's agenda, with new core and developmental standards set to replace the Star Ratings in 2006. Yet the results of this analysis provide little evidence that the current indicators have sufficient construct validity to measure the underlying concept of quality, except when the specific area of screening is considered.
Public services, including health, have increasingly been subjected to performance assessments, designed to fulfil the Government's "commitment to providing patients and the general public with comprehensive, easily understandable information on the performance of their local health services" . Furthermore, performance assessments in health care should promote patient involvement, provide accountability and enhance patient choice . However, a recent action research report has highlighted that the UK public do not like performance league tables and consider sources of information on quality as inadequate . In addition, Star Ratings have induced adverse effects, such as distorted clinical priorities, bullying and reduced morale  in acute hospital trusts, often resulting in institutional stigma . Trusts may also game with definitions of required standards, such as determining when the 8 minute ambulance call-out time actually starts .
Theoretically, Pringle and colleagues identify twelve methodological attributes of an ideal indicator: validity, communicable, effective, reliable, objective, available, contextual, attributable, interpretation, comparable, remediable and repeatable (see Table 1 for definitions) . There is currently no Performance Indicator that fulfils all of these attributes and the existence of multiple indicators raises questions over which should be used – we return to this issue below. A further difficulty arises since some Performance Indicators are composite measures across numerous domains. While composites present a "big picture", scores are sensitive to the weighting and aggregation processes applied . One essential 'acid test' considered in this paper is the construct validity of the indicator (a combination of the attributes "effective" and "comparable" used by Pringle and colleagues ). Construct validity implies that the indicators measure what they are intended to measure (in this case, quality). Construct validity is essential if Performance Indicators are to be used fruitfully by the public in their newly-acquired choice of providers or by regulators as a means of imposing sanctions or rewards.
Table 1. Descriptions of attributes of ideal quality indicators
This paper focuses on six Performance Indicators available in the public domain for the 303 English Primary Care Trusts (PCTs). Since no Gold Standard indicator exists, we assess the correlations between different pairs of indicators expected or hypothesised to be related. The underlying logic is that correlation is a necessary, but not sufficient, condition for construct validity. If no correlation exists, then at least one of the indicators must be an invalid measurement of a common construct. The existence of correlation is not proof of construct validity, since this requires certainty regarding causation . However correlation at least suggests that whatever two correlated indicators are measuring it is the same thing: and given face validity this may be the best evidence of construct validity obtainable in circumstances where there is no Gold Standard.
Given the existence of multiple indicators, a more holistic approach to quality assessment is to consider the 'within PCT' variance across the six indicators. Differences in the relative performance of a PCT across the separate indicators may suggest that quality is not consistent across the PCT (providing that the indicators do, in fact, have construct validity). We examine 'within PCT' variances in this paper, acknowledging the reviewer who suggested this idea. Lastly, we have identified a number of features of a PCT, such as expenditure per capita, which might be correlated with the various performance measurements. We examine these in a statistical model to seek associations which might be informative.
The analysis in this paper is a cross sectional quantitative analysis of six Performance Indicators and PCT characteristics for the 303 English PCTs.
The most recent data on the six Performance Indicators (Table 2) used in this analysis were downloaded from the internet during August 2005. The indicators are: Star Rating, Quality and Outcomes Framework (QOF) total, Dr. Foster Mortality index, Dr. Foster Equity index, NHS Litigation Authority (NHSLA) Risk Assessment and patient satisfaction total from the Star Ratings. The information was combined into a database using Stata v.7 (Stata Corp, Texas). Data on possible explanatory variables were then added to the database (Table 2). More detailed information on the Performance Indicators and explanatory variables can be found in 1.
Table 2. PCT performance indicators and explanatory variables
Relationships between Performance Indicators are assessed across two domains: pairs of indicators purporting to measure the same underlying health construct (e.g. access to services) and pairs of indicators hypothesised to be related (e.g. higher standards of care and patient satisfaction). We use both composite Performance Indicators and their components in these analyses. In identifying relationships, consideration was given to the health care setting: while a PCT may be able to foster a culture of excellence across all organisations, it may be inappropriate to expect a relationship between an indicator based solely on general practice and another based solely on hospital care. Initial assessments of relationships were undertaken using scatter diagrams if both variables were continuous with subsequent calculation of Pearson correlation coefficients if relationships appeared to be linear. For pairs including one categorical and one continuous variable, we use box and whisker diagrams and/or mean score analyses.
We apply a basic approach to assessing the 'within PCT' variance across the six Performance Indicators (based on that of Fahey and Gibberd ). A PCT is given one point for each indicator if the PCT's score on the indicator is better than the mean, but loses one point if the score is below the mean. No points are accrued or lost if the PCT's score is equal to the mean. For the two categorical variables, PCTs with none or one Star lose one point, those with two Stars accrue no points and those with three Stars gain one point; NHSLA Risk Assessments are scored as -1 (Level 0), 0 (level 1A) and +1 (Level 1B). We then find the total number of points for each PCT, giving a possible range of -6 (below average on all six indicators) to +6 (above average on all six indicators). An examination of the resulting score distribution provides an insight into the holistic 'quality' of the PCTs.
The effect of the five explanatory variables (Table 2) on the Performance Indicators was first explored using forward stepwise multiple regression analyses, based on ordinary least squares for continuous dependent variables and ordered logit for categorical variables. A similar method is used by Sutton and McLean in a practice-level analysis for 60 general practices in Scotland . Jha and colleagues also use this approach in their analysis of US hospital performance . Relationships between pairs of dependent and explanatory variables identified in the regressions were demonstrated using scatter diagrams, box and whisker diagrams, Pearson Correlation coefficients and/or mean score analyses.
The first analysis investigated pairs of indicators purporting to measure the same underlying health care construct. Here, correlations would help validate the indicators, with independent measures of the same construct resulting in analogous PCT ratings. The first pair of indicators is Star Rating and QOF total, as both are composite primary care performance measures. Figure 1 shows a box and whisker diagram that analyses QOF totals for PCTs with each Star Rating. It is clear that these Performance Indicators are not related; a one-way Anova confirms no differences between QOF means across Star Ratings (F = 0.66, p = 0.57).
Figure 1. Box plot of QOF totals by Star Rating.
The Additional Services Domain on the QOF and the Improving Health category on the Star Ratings both purport to measure screening and other preventative services in general practice. Specific overlaps are cervical screening, child health surveillance and contraceptive services. Not surprisingly, there is a positive relationship between Additional Services and Improving Health assessments (Table 3).
Table 3. Relationship between Improving Health rating (Star) and Additional Services score (QOF)
Four indicators measure access to services: Access Bonus on the QOF, Access to Quality Services category on the Star Ratings, Equity from the Dr. Foster ratings and the Access and Waiting section on the Patient Satisfaction survey. Analysis of pairs of indicators where health care domains overlap provides insufficient evidence to suggest that these indicators are measuring the same underlying concept. The specific results of this analysis are shown in 2. It is not possible to say whether any of the access measures are 'better' than the others.
The second analysis investigated pairs of indicators measuring different health care concepts but which are hypothesised to be related. Relationships provide evidence that different indicators are valid in that they measure the general concept of 'quality' or 'performance'. First, we hypothesise that PCTs with higher Star Ratings or NHSLA Ratings would have lower hospital mortality. This is because Star Ratings provide an overall measure of PCT quality that incorporates elements of hospital care; whilst NHSLA Ratings are based on safety procedures that, if filtered down to the hospitals within a PCT's commissioning remit, should have a positive effect on the standard of care. A mean score analysis suggests that a higher Star Rating does not imply lower hospital mortality (F = 0.61, p = 0.61). PCTs with no Stars have a mean mortality ratio of 102.7, compared with 99.9 for 1 Starred PCTs, 100.2 (2 Stars) and 101.5 (3 Stars). There is a similar result for the NHSLA Rating (F = 0.31, p = 0.73): the mean mortality ratio is 99.3 for Level 0 PCTs, 100.6 for Level 1A PCTs and 100.5 for Level 1B PCTs.
Second, we hypothesise that the better the overall standard of care in general practice (QOF total), the more satisfied are the patients. Figure 2 shows that there is a positive relationship between the quality of care and patient satisfaction (Pearson's r = 0.61, p < 0.001). The concentration of points towards the top right of the scatter plot suggests both variables are negatively skewed due to ceiling effects. One reviewer commented that the positive correlation may be driven by the outliers. Indeed, if we restrict the Pearson's r calculation to the 205 PCTs whose QOF and patient satisfaction totals both lie between the 10th and 90th centiles of their respective distributions, the coefficient falls to 0.38, although this is still statistically significant at p < 0.01.
Figure 2. Correlation between QOF total and patient satisfaction total.
Holistic quality assessment
Our assessment of 'within PCT' variances, in which a score of -6 indicated a PCT with below average performance on all six indicators and a score of +6 a PCT with above average performance, resulted in a fairly symmetrical distribution of scores (Figure 3). 136 (45%) of PCTs had a score of -1, 0 or +1, with just three PCTs (1%) scoring -6 and six PCTs (2%) scoring +6. The distribution of the total scores in Figure 3 is consistent with the hypothesis that the individual Performance Indicators were allocated randomly and supports the finding that correlations between Performance Indicators are weak.
Figure 3. Distribution of 'holistic' quality scores across PCTs.
Accounting for differences in PCT performance
Table 4. Final forward stepwise regression results
The North East, North West and London have a greater percentage of 3 Star PCTs than the national average (50%, 40% and 35% respectively compared to 19%). PCTs in the East are the least likely to have 3 Stars, with only 5% achieving this rating. There is a weak inverse correlation between star rating and PCT expenditure per capita (F = 6.41, p < 0.001). Mean expenditure per capita for PCTs with no or one Star is €1,080 compared to €1,034 for the higher rated PCTs. There is also an inverse correlation between PCT size – the number of registered patients – and Star Rating (F = 4.51, p = 0.004). The lower rated PCTs tend to have more patients (a mean of 196,000 compared to 166,000 for PCTs with 2 or 3 Stars).
The only explanatory variable with a significant influence on a PCT's QOF total is the Index of Multiple Deprivation. There is a negative linear correlation between the Index of Multiple Deprivation and QOF, as shown in Figure 4 (Pearson's r = -0.59, p < 0.001). If the effect of outliers is removed by restricting the sample to the 201 PCTs whose Index of Multiple Deprivation and QOF scores lie between the 10th and 90th centiles of both distributions, the Pearson's r is reduced to -0.37 (p < 0.01). This result suggests that PCTs with the highest deprivation have the lowest QOF scores and will thus attract the least additional funding. In their analysis of practice-level data in Scotland, Sutton and McLean find that deprivation has a positive effect on scores for clinical and holistic care .
Figure 4. Correlation between IMD score and QOF total.
None of the explanatory variables are a good determinant of hospital mortality rates. There is evidence of small negative relationships between mortality and the number of General Practitioners per capita (Pearson's r = -0.32, p < 0.001) and PCT expenditure per capita (r = -0.28, p < 0.001).
Mean equity ratio
Admissions in London (mean ratio = 84.4) are less equitable than admissions in all other regions (mean ratio = 101.2; t = 5.21, p < 0.001). The equity ratio has already been adjusted for case mix and thus differences in patient demographics may not explain this result.
NHSLA Ratings vary by region. Compared to a national average of 29%, the percentage of PCTs with the highest rating (1B) is highest in Yorkshire and Humberside (41%) and the West Midlands (40%) and lowest in the East (15%) and North East (19%). However, the effect of region was not statistically significant in the ordered logit analysis (p > 0.05).
There are two noteworthy influences on patient satisfaction. First, there is a negative relationship between Index of Multiple Deprivation and patient satisfaction (Pearson's r = -0.46, p < 0.001): i.e. poorer areas have lower satisfaction. This finding appears consistent over time, as MORI report a similar result for 2001–3 . Second, patients in London report lower satisfaction than patients in other regions (with mean scores of 73% and 78% respectively; t = 9.13, p < 0.001).
Our analysis provided evidence of construct validity for measures of screening and preventative health care in the Star Ratings and QOF but not for different measures of access to services. At a more general level, there was no relationship between Star Rating and QOF total. Some may argue this result would be expected since it is asking a lot of PCTs to engender a level of quality across all the organisations within their commissioning remit, including both general practice and hospital care. In turn, such an argument invokes debate over the appropriateness of assessing health care performance at PCT level.
There is evidence that patients report higher satisfaction with PCTs where general practices achieve higher QOF scores. However this may be a spurious association since both QOF scores and patient satisfaction are negatively related to deprivation and hence the direction of causality is not clear. The correlations are also partly driven by the outlying PCTs in the distributions. The relationship between QOF score and deprivation contrasts with that reported by Sutton and McLean  which may be because our analysis is at PCT, rather than practice level, or because our analysis is based on the English, rather than the Scottish system. We also find that quality is not consistent across the six Performance Indicators in many PCTs.
Region was found to be a determinant of performance across a number of indicators. However as a further illustration of discrepancies between different Performance Indicators, patients in London receive the lowest equity in hospital admissions and report the lowest satisfaction with their care, yet have a relatively high proportion of 3 Star PCTs.
This paper considered a limited number of explanatory variables: a wider range is used by Jacobs and Smith in their analysis of determinants of Star Ratings for acute hospital trusts . Other possible explanatory variables could focus on organisational characteristics of PCTs and health care organisations, which is an approach taken by Sutton and McLean . Alternatively, one possible variable omitted from this analysis is the age distribution of patients. Taylor et al.  report that the proportion of patients over 65 had a positive influence on patient satisfaction in 2002/3, and the tendency for older patients to give higher ratings of their care is noted elsewhere . Consideration of ethnic differences may also explain differences in patient satisfaction, since ethnic minorities are harder to satisfy .
The analysis in this paper is based on a snapshot using the latest available data, requiring an assumption that measurements and explanatory variables do not change significantly over time. An alternative, longitudinal approach to measuring performance in general practice using disease-specific indicators is reported by Campbell and colleagues . Based on this approach, an assessment of PCTs' ability to improve standards across different Performance Indicators may provide a different perspective to that considered here.
The results in this paper cast doubts on whether any of the available Performance Indicators help the public to accurately assess the level of care received at their PCT, although Marshall and colleagues question whether patients actually value such assessments . In future, it may be relevant to consider if quality can be related to health, although evidence from Canada suggests that such relationships are unlikely . Performance Indicators should also provide PCTs with an incentive to improve standards of care, yet if Performance Indicators are not a valid measure of performance then scarce resources may be directed to the wrong ends.
A more detailed analysis of multi-faceted indicators, to identify outliers on particular components of the indicators, may help PCTs prioritise areas for improvement. This analysis could be based on the methods of Gibberd and colleagues  and we are currently planning a study of this nature based on the 2006 Health Check data, once these data become available.
Given that a PCT's responsibilities are so multi-faceted, health care performance may not be best measured at PCT level. QOF totals, mortality and mean equity have been aggregated to PCT level from either general practice or hospital level and the aggregation process is likely to mask individual differences in performance across the PCT. A possible solution is for PCT level indicators to be focused only on aspects of care within the direct control of the PCT, rather than indirect aspects that can only be assessed in the care organisations commissioned by the PCT.
The Star Rating scheme is currently being revised to include monitoring of both core standards and progress towards developmental targets, which will introduce an improvement element to the existing purposes of quality assurance and accountability [18,19]. We await an evaluation of this Annual Health Check in due course, including an assessment of the validity of the indicators.
List of abbreviations
NHSLA: National Health Service Litigation Authority
PCT: Primary Care Trust
QOF: Quality and Outcomes Framework
CB and RL developed the study. CB undertook the data gathering and analysis. CB and RL drafted the paper.
Competing interests declaration
RL and CB have been invited to/have attended Safety Expert Reference Group meetings at the Healthcare Commission regarding the Safety developmental standards within the Annual Health Check.
Data on multiple deprivation at PCT level have been made available by The National Database for Primary Care Groups and Trusts. This database is a product of the National Primary Care Research and Development Centre at the University of Manchester. It was devised by Dr Deborah Baker, who is the database director. The database was constructed by Justin Hayes at the Regional Research Laboratory, School of Geography, University of Manchester (Director: Dr Robert Barr); SEE IT consultancy designed and built the map interface. We are grateful to Andrew Wagner, Mark Hann and David Reeves (NPCRDC) for their considerable hard work in cleaning and validating the data sets. Andrew Wagner is the database manager email@example.com.
We would also like to thank the two peer-reviewers of this paper, who provided some helpful suggestions for improvements to this paper.
Department of Health. Performance Ratings [http://www.performance.doh.gov.uk/performanceratings/2002/national.html] webcite
JAMA 2002, 283:1866-1874. Publisher Full Text
The Lancet 2004, 363:1147-1152. Publisher Full Text
J R Statist Soc A 2005, 168:1-27. Publisher Full Text
doi:10.1136/bmj.38742.554468.55 (published 8 February 2006).
Jnl Epidemiology and Comm Health 2002, 56:482. Publisher Full Text
doi:10.1136/bmj.38632.611123.AE (published 28 October 2005).
Int Jnl Qual Health Care 2004, 16:i37-i43. Publisher Full Text
The pre-publication history for this paper can be accessed here: