Validity and reliability of a multiple-group measurement scale for interprofessional collaboration

Kenaszchuk, Chris; Reeves, Scott; Nicholas, David; Zwarenstein, Merrick

doi:10.1186/1472-6963-10-83

Research article
Open access
Published: 30 March 2010

Validity and reliability of a multiple-group measurement scale for interprofessional collaboration

Chris Kenaszchuk¹,
Scott Reeves^1,2,3,
David Nicholas^4,5 &
…
Merrick Zwarenstein^6,7,8

BMC Health Services Research volume 10, Article number: 83 (2010) Cite this article

20k Accesses
74 Citations
2 Altmetric
Metrics details

Abstract

Background

Many measurement scales for interprofessional collaboration are developed for one health professional group, typically nurses. Evaluating interprofessional collaborative relationships can benefit from employing a measurement scale suitable for multiple health provider groups, including physicians and other health professionals. To this end, the paper begins development of a new interprofessional collaboration measurement scale designed for use with nurses, physicians, and other professionals practicing in contemporary acute care settings. The paper investigates validity and reliability of data from nurses evaluating interprofessional collaboration of physicians and shows initial results for other rater/target combinations.

Methods

Items from a published scale originally designed for nurses were adapted to a round robin proxy report format appropriate for multiple health provider groups. Registered nurses, physicians, and allied health professionals practicing in inpatient wards/services of 15 community and academic hospitals in Toronto, Canada completed the adapted scale. Exploratory and confirmatory factor analysis of responses to the adapted scale examined dimensionality, construct and concurrent validity, and reliability of nurses' response data. Correlations between the adapted scale, the nurse-physician relations subscale of the Nursing Work Index, and the Attitudes Toward Health Care Teams Scale were calculated. Differences of mean scores on the Nursing Work Index and the interprofessional collaboration scale were compared between hospitals.

Results

Exploratory factor analysis revealed 3 factors in the adapted interprofessional collaboration scale - labeled Communication, Accommodation, and Isolation - which were subsequently corroborated by confirmatory factor analysis. Nurses' scale responses about physician collaboration had convergent, discriminant, and concurrent validity, and acceptable reliability.

Conclusion

The new scale is suitable for use with nurses assessing physicians. The scale may yield valid and reliable data from physicians and others, but measurement equivalence and other properties of the scale should be investigated before it is used with multiple health professional groups.

Peer Review reports

Background

Interprofessional collaboration (IPC) occurs when individuals from different health professions communicate and make decisions about a patient's health care based on shared knowledge and skills [1]. Collaborative practice among health care providers is coming to be viewed as a key component of strengthening Canada's health care system [2–4]. Interprofessional collaboration is expected to maximize health human resources, promote habits and customs leading to safer patient care, and increase satisfaction among patients and providers in Canada's health care institutions [5]. In the province of Ontario, an advisory committee to the Ministry of Health and Long-Term Care (MOHLTC) called for an evaluation framework to provide "evidence on the outcomes and benefits of interprofessional care," and for "sharing collective performance measures among peers" [5]. Clinicians trained in a varied array of health disciplines practice with nurses and physicians in many health care settings, including sites such as primary care and family medicine, emergent, intensive, and acute inpatient hospital care, and complex continuing care and rehabilitation [6]. A recognized need is emerging to broaden research and evaluation from the traditional focus on nurses and physicians to include other health professionals who have been regarded as allied with medicine and nursing. The Interprofessional Care Steering Committee emphasized this point by noting that, "in order to be inclusive and successful, all types of health caregivers must participate in implementing recommendations" presented in its action plan for interprofessionality [5, 13].

The HealthForce Ontario program of the MOHLTC has funded projects to enhance the evidence base for IPC. We participated in a recent project funded by MOHLTC. A significant component of the project involved explicating some of our views of the status of IPC measurement. Accordingly, the present paper discusses a novel approach to measurement instrumentation. It presents initial psychometric results of a measurement scale designed for use with multiple health care provider groups in interprofessional collaboration research projects with measurement objectives.

Rationale

The centerpiece of our approach was a commitment to obtain measurement survey coverage of all clinicians working in acute care wards using survey items constructed in a way that makes clinician respondents 'raters' of targeted clinician groups. In other behavioural research disciplines this structure has been called a round robin design [7, 8]. To implement this structure we required an instrument designed for three groups of clinicians: physicians, nurses, and other regulated health care professionals. To make interprofessional comparisons, data were desired with survey items measured in a common scale. A measurement instrument was sought that had been developed for use with multiple groups of health professionals working in inpatient acute care hospital wards. Our search was guided in part by two reviews of instruments for nurse-physician collaboration [9, 10]. In addition, we conducted a non-systematic examination of peer-reviewed literature in nursing, allied health care, and medicine journals, seeking IPC instruments developed for multiple rater/target groups and used in a round robin format. We could not find an instrument with these characteristics and we concluded that a suitable instrument was not available.

Two results of our review were influential. First, scales focused on relationships between two groups of health professionals, usually nurses and another group. Nurses and physicians should be central figures in measurement schemes but, in our opinion, not so dominant as to suppress considerations of professions allied with medicine and nursing, e.g., therapists, social workers. Other scales did not present common items to all of the health professional groups for whom the survey was intended.

Second, we were uncomfortable with decisions made (and not made) in the factor analytic approaches undergirding the development of scales presently in the literature. Many were developed using exploratory factor analysis (EFA). As implied by its name, EFA is exploratory, useful for initial investigation and development of scales to suggest factor structure dimensionality. EFA can be foundational for theoretical and hypothesis-driven refinements investigated in a confirmatory factor analysis (CFA) framework. However IPC researchers should recognize that EFA is primarily a data-driven method, and it appears as if the IPC measurement advances from EFA to CFA that one would expect (and hope) to see as construct validity is clarified have largely not occurred. As well, factor analysis experts have argued that EFA should not be used as a basis for final determination of a construct because EFA-based factor structures may not be reproducible in other data sets [11–13].

A guarded position towards EFA methods for IPC instruments was also motivated by a review [14] of factor analyses in a prominent nursing research journal. Some of the review's negative descriptions corresponded with our own reading of the IPC scale literature, from which three problems are highlighted. First, published factor analytic accounts do not report enough information about major decision points; reporting is highly selective. Second, sample sizes are at the low range of recommendations, which themselves are estimates that vary widely and appear with little support. Third, default software package options and suboptimal practices are often used, such as the well known Kaiser-Guttman rule for deciding the number of factors to retain, also known as the 'eigenvalues > 1' criterion [15–17]. This can be a problematic method in some circumstances [18, 19] when one considers that the benefits of contemporary methods like parallel analysis and bootstrapping for factor retention have been shown.

The major problem with IPC instruments is that when responses to scale items are made by members of two clinical groups, and are assumed to be responses from equivalent groups, then a presumption of measurement equivalence/invariance (M-E/I) is imposed on the data. This means one is assuming that the same latent dimension is being assessed in both populations. Rather than assuming M-E/I, the issue should be viewed as an open hypothesis. Measurement invariance is a recent innovation in health services research [20], but rationale and methods for it have been presented in a variety of sociobehavioural research outlets [21–23]. In line with these arguments, our long-term goal is to develop a scale that is suitable for administration to key professional groups commonly working in acute care hospital wards - physicians, nurses, and allied health professionals. Ultimately this goal requires evidence that scale measurement invariance exists between multiple health professional groups. The evidence should include initial findings that an instrument is useful with at least one group.

Thus, we report here on the dimensionality and construct validity of a new scale for multiple-group measurement of IPC. Convergent, discriminant, and concurrent validity were assessed with a series of comparisons between the new IPC scale and two scales that have been used to measure facets of interprofessional collaboration, namely nurse-physician relationships and attitudes toward working in health care teams.

Methods

Given limitations described above, data were collected to support factor analytic investigations of a multiple-group measurement scale. We considered the benefits of adapting an existing instrument against beginning scale development anew from a pool of candidate indicator items. Although it is commonly recommended, forming an item pool can be inefficient if it duplicates previous successful efforts and superfluous if less successful than previous efforts.

Nursing perspectives are central to a fully interprofessional approach. In particular, patient safety and quality of care in hospital settings are closely related to nursing practice, as are the sheer size of the nursing force, its relationship with medical practice (both maladaptive and complementary), and nursing shortages. Clearly, an instrument for interprofessional practice must be suitable for the nursing profession if it is to be useful for multiple groups. Lake's [24] review noted the Nurses' Opinion Questionnaire (NOQ) of the Ward Organisational Features Scales [25] as one of three leading measurement instruments for the nursing practice environment based on theoretical relevance. We adapted items appearing in several NOQ subscales to a new round robin format for use with multiple groups of health professionals. The NOQ's items tap important dimensions of IPC that are relevant for all acute care health professionals, such as discussion and movement of information among clinicians, cooperation, and conflict resolution. The relevant NOQ subscales were published with the labels Collaboration with Medical Staff, Collaboration with Other Health Care Professionals, and Cohesion Amongst Nurses [25].

NOQ Adaptations

Subscales were adapted in two significant ways. First, some items in the nurse-physician NOQ subscale did not appear in the analogous subscales for relations between nurses and other healthcare professionals. For example, the item, "doctors are willing to discuss nursing issues," did not appear in the nurse-other subscale. Some items in the nurse-other subscale did not appear in the nurse-physician subscale, such as, "treatment carried out by other health care professionals often gives me cause for concern." This item was deleted. Items in the nurse cohesion subscale were intended for nurses, but two items were deemed important to include with nurse-physician and nurse-other assessments. These were, "important information is always passed on," and, "I feel nurses do not communicate with each other as well as they should." The second of these was revised to, "It is important to communicate well with [them]." We re-wrote items to render them presentable to members of three health professional groups with identical phrasings but for changes in the naming of the target group.

The second adaptation involved three items that appeared in the NOQ's nurse-physician and nurse-other subscales and addressed essentially-identical substantive target ideas. One item was written similarly in both subscales, but the other two were not. In the nurse-physician subscale one item read, "Doctors are usually willing to take into account the convenience of the nursing staff when planning their work." The analogous item in the nurse-other subscale was phrased as, "Other health care professionals ignore the convenience of the nursing staff when planning their work." One of these is phrased in a positive direction (the first, naming doctors) and the other in a negative direction. We standardized the item by writing it in the same direction (positive) for both target groups. Another item was also re-written for the same reason. For nurse-physician relationships it read, "medical staff cooperate with the way we organise nursing," and for nurse-other relationships the item was, "other health care professionals do not co-operate with the way we organise nursing."

The survey was constructed to elicit round robin ratings, meaning that items identified other clinical provider groups explicitly. Respondents self-identified their profession; we aggregated professions to higher-level groups. This was straightforward for nurses and physicians whose credentialing and licensure bestow common training within their professions. Members of other health professions - occupational and physical therapists, pharmacists, and social workers - were aggregated into a third group, allied health staff. This group encompasses a wide range of training backgrounds that may not be accurately represented by a catchall label but the many relevant core competencies that are common to allied health professionals, nurses, and physicians suggest that these professions have important, shared characteristics [6]. Aggregating data created six rater-target combinations. Naming the rating group first and the target group second, the rater-target dyads were: physicians-nurses, physicians-allied health staff, nurses-physicians, nurses-allied health staff, allied health staff-physicians, and allied health staff-nurses. It can be seen that the round robin design causes a group member to rate both other groups. In this respect respondents are proxy reporters on the collaboration behaviours of group targets. The round robin design also makes each group a target of two other groups.

Four response options were available for the items: strongly disagree (1), disagree (2), agree (3), and strongly agree (4). Five items were written in a negative direction; higher-numbered responses to these items - agreeing or strongly agreeing - represent an opinion that IPC was qualitatively worse. Numeric responses to negative items were recoded to align with positively-phrased items.

Participants

The adapted IPC scale and other scales were administered to regulated health professionals in 15 community and teaching hospitals in Ontario, Canada as part of several independent IPC projects that occurred between 2006 and 2008. The projects had various objectives, such as intervention and evaluation, simulation, and organizational climate measurement. Survey completion was voluntary at all sites and was not linked either with occupational advancement or censure. Respondents received small incentives at some sites but not all, because compensation was determined within the context of independent research protocols and budgets. Survey responses were always confidential and de-identified from personal names and other identifying information. Approval for the study was granted by research ethics committees of the University of Toronto, the Humber Institute of Technology and Advanced Learning, Chatham-Kent Health Alliance, Children's Hospital of Western Ontario, the Hospital for Sick Children, Hotel Dieu Hospital, Kingston General Hospital, Lakeridge Health Corporation, London Health Sciences Centre, North York General Hospital, Rouge Valley Health System, St. Mary's General Hospital, the Scarborough Hospital, Sunnybrook Health Sciences Centre, Thunder Bay Regional Health Sciences Centre, Toronto East General Hospital, and Trillium Health Centre.

Exploratory and confirmatory factor analysis

Because the properties of scales can change after being adapted [26], we performed factor analyses to investigate the properties of the instrument. Subscales were extracted from 14 items using 3 factor analysis steps and data combinations. In the first step exploratory factor analysis was conducted with data from 7 hospital sites (1 academic and 6 community hospitals) and responses from nurses evaluating physicians. Responses were treated as ordered categorical and were analyzed using the WLSMV (weighted least squares with mean- and variance-adjusted chi-square test statistic) estimator implemented in Mplus version 5.2. WLSMV is considered to be a strong estimation method for factor analysis with categorical data [27]. WLSMV provides statistical criteria to evaluate model fit in both exploratory and confirmatory modes, as well as conventional factor pattern coefficients ('loadings') and rotation methods. The main criteria for factor extraction were factor solutions based on eigenvalues > 1.0, model fit indices, and conceptual usefulness. All fit indices output by Mplus for WLSMV estimation are reported in the paper to demonstrate that model fit and evaluation were not enhanced by selective reporting of fit statistics. EFA solutions were rotated to enhance conceptual clarity on assumptions that the factors were either uncorrelated (orthogonal) or correlated (oblique). Varimax and promax rotations were examined; results from promax solutions were considered most useful and are reported. Varimax patten coefficients are not indices of model fit and were not reported. Models were evaluated by their ability to produce subscales that (a) suggested three or more items for retention on a subscale, (b) had salient item factor loadings, (c) displayed internal consistency of items, and (d) exhibited theoretical and conceptual clarity of factors and items for measuring interprofessional collaboration.

The second factor analytic step was a hybrid of exploratory and confirmatory modes: exploratory factor analysis within a confirmatory framework (E/CFA) [27, 28]. E/CFA was employed as an intermediate step after EFA because it was not clear that moving to fully confirmatory mode was justifiable. E/CFA requires an analyst to pre-determine a number of factors and to estimate a model that loads all scale indicators on all factors. Based on exploratory results, indicator items are specified as anchor variables for factors on which they are hypothesized to load highest. Anchor items could be those that had the largest pattern coefficients in exploratory mode, for example. Factor variances are fixed to unity, factor covariances are freely estimated, non-anchor items are free to load on all factors and, unlike traditional EFA, indicator cross-loadings and residual covariances can be fixed to zero. This step was completed with data from nurses at 8 hospital corporations and sites different from those used in the EFA step (3 community and 5 academic hospitals, including 2 academic paediatric hospitals).

The third factor analytic step was fully confirmatory, and based on results obtained in the E/CFA analysis. Data from nurses at all hospital sites were combined for full CFA estimation. Results of fully confirmatory models were evaluated on goodness of fit, areas of localized strain, and interpretability of parameter estimates.

Validity and other measures

Convergent and discriminant validity of the IPC scale were examined by incorporating data from measurement scales that are used frequently in research on interprofessional working relationships. These were the Collegial Nurse-Physician Relations Subscale of the Nursing Work Index (NWI-NPRS) [29] and the subscales of the Attitudes Toward Health Care Teams Scale (ATHCTS) [30]. Several versions of the NWI-NPRS have been published. There is considerable item overlap between different versions, and for consistency our survey protocol used the three items that were all reported in three specifications of the NPRS [24, 29, 31]. Items are shown in the appendix. Given that the new IPC scale was adapted from one designed to measure nurses' views of nursing relationships with physicians, a substantial correlation between nurse responses about physicians on the IPC scale and the NWI-NPRS was expected. The amount of correlation was taken as an indicator of the convergent validity of the IPC scale with the NWI-NPRS.

The ATHCTS consists of 3 subscales to measure self-reported facets of attitudes toward collective teamworking in health care groups. The subscales contain 21 items in total and have been named Attitudes Toward Team Value, Attitudes Toward Team Efficiency, and Attitudes Toward Shared Leadership/Physician Centrality in prior literature. Higher scores on the Shared Leadership subscale indicate greater endorsement of distributed decision-making and less belief that work of nurses and other professionals should be performed principally to support medical dominance in decision-making. ATHCTS items measure attitudes, beliefs and opinions more than actual working practices. Low or moderate correlations between the other-directed IPC and self-appraisal ATHCTS subscales were expected. Low correlations were taken to indicate discriminant validity of the IPC scale. As well, low correlations were expected between the NWI-NPRS and the ATHCTS subscales. Scale intercorrelations were estimated by confirmatory factor analysis.

Concurrent validity was examined by performing all pairwise hospital site comparisons of mean scale score differences for the NWI-NPRS and IPC subscales using nurses' ratings of physicians. Significantly different sitewise comparisons on the scales were examined. The extent of overlap in sitewise mean differences between the new IPC scale and the established NWI-NPRS should indicate whether the IPC scales have concurrent validity with a measure that is highly relevant for the nursing work environment like the NWI-NPRS. Individual survey respondents were conceptualized as being nested within hospitals and a multilevel model was estimated with one level-2 predictor (hospital) and no random effects. Written in composite form, the statistical model was:

(1)

This is a means-as-outcomes model. Least squares means for hospitals were estimated and compared using PROC MIXED in SAS 9.1.

Hospital-level reliability of IPC measures was examined by two methods. First, nurses' scores on the IPC and NWI-NPRS scales were summed and aggregated to the hospital-site level. Polychoric correlations [32, 33] were calculated between responses to the IPC subscales' indicators, and the average interitem correlation was examined for each hospital site. Second, interrater reliability of nurse responses across hospital sites was evaluated by the intraclass correlation [ICC (1, k)] using the SAS macro called %INTRACC, and the reported statistic labeled, "Shrout-Fleiss reliability: mean k scores." The ICC estimates stability of data at the hospital level. It indexes mean rater reliability of hospital-level data and is interpreted as the extent to which similar mean scores would be obtained if additional respondent samples were taken repeatedly from hospitals. It has been recommended that both average interitem correlation and ICC(1, k) should exceed .60 to justify group comparisons, i.e., between-hospital comparisons in this case [34].

Results

Construct validity: factor analysis

Exploratory factor analysis was conducted with raw data input to Mplus. Mplus computes a polychoric correlation matrix for ordered categorical data. The number of cases used in the analysis of nurses evaluating physicians was 144. The solution revealed 3 eigenvalues greater than 1.0 (6.17, 1.38, and 1.21), therefore solutions with 1, 2, and 3 factors were examined. For 1-factor and 2-factor solutions with promax rotation, χ² statistics for tests of model fit were 82.412, d.f. = 30, p < .001, and 50.928, d.f. = 26, p < .001. Root mean square error of approximation (RMSEA) and root mean square residual (RMSR) values were .112 (RMSEA) and .903 (RMSR) for a 1-factor solution and .082 (RMSEA) and .073 (RMSR) for a 2-factor solution. The 2-factor solution was preferred over the 1-factor solution based on model fit indices. Four factor pattern coefficients cross-loaded on the two factors (>.30) in the 2-factor solution, model fit was not deemed satisfactory, and we considered the 3-factor solution. The 3-factor model fit better than others (χ² test of model fit = 41.61, d.f. = 25, p = .027; RMSEA = .065; RMSR = .06). Simple structure [35] was obtained. Factor pattern coefficients for items are presented in Table 1. Most items loaded >.30 on 1 factor only and weakly or negatively on others; however 2 items had moderate cross-loadings on 2 factors (items 6 and 8). Item 14 did not load high enough (>.30) to justify placement on any factor. The 3-factor solution was retained and submitted to exploratory factor analysis within the confirmatory framework (E/CFA) [27].

Table 1 EFA 3-factor solution: Promax rotated loadings

Validity and reliability of a multiple-group measurement scale for interprofessional collaboration

Abstract

Background

Methods

Results

Conclusion

Background

Rationale

Methods

NOQ Adaptations

Participants

Exploratory and confirmatory factor analysis

Validity and other measures

Results

Construct validity: factor analysis

Construct validity: convergent and discriminant validity

Reliability of scales

Concurrent validity: sitewise (hospital) comparisons of mean scale scores

Discussion

Limitations

Conclusion

Appendix

Nurse-physician relationships items of the Nursing Work Index

References

Pre-publication history

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Health Services Research

Contact us