Comparison of two dependent within subject coefficients of variation to evaluate the reproducibility of measurement devices
- Equal contributors
1 Department of Epidemiology and Biostatistics, Schulich School of Medicine, University of Western Ontario, London, Ontario, Canada
2 Department of Biostatistics and Epidemiology, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
3 Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
BMC Medical Research Methodology 2008, 8:24 doi:10.1186/1471-2288-8-24Published: 22 April 2008
The within-subject coefficient of variation and intra-class correlation coefficient are commonly used to assess the reliability or reproducibility of interval-scale measurements. Comparison of reproducibility or reliability of measurement devices or methods on the same set of subjects comes down to comparison of dependent reliability or reproducibility parameters.
In this paper, we develop several procedures for testing the equality of two dependent within-subject coefficients of variation computed from the same sample of subjects, which is, to the best of our knowledge, has not yet been dealt with in the statistical literature. The Wald test, the likelihood ratio, and the score tests are developed. A simple regression procedure based on results due to Pitman and Morgan is constructed. Furthermore we evaluate the statistical properties of these methods via extensive Monte Carlo simulations. The methodologies are illustrated on two data sets; the first are the microarray gene expressions measured by two plat- forms; the Affymetrix and the Amersham. Because microarray experiments produce expressions for a large number of genes, one would expect that the statistical tests to be asymptotically equivalent. To explore the behaviour of the tests in small or moderate sample sizes, we illustrated the methodologies on data from computer-aided tomographic scans of 50 patients.
It is shown that the relatively simple Wald's test (WT) is as powerful as the likelihood ratio test (LRT) and that both have consistently greater power than the score test. The regression test holds its empirical levels, and in some occasions is as powerful as the WT and the LRT.
A comparison between the reproducibility of two measuring instruments using the same set of subjects leads naturally to a comparison of two correlated indices. The presented methodology overcomes the difficulty noted by data analysts that dependence between datasets would confound any inferences one could make about the differences in measures of reliability and reproducibility. The statistical tests presented in this paper have good properties in terms of statistical power.