Copula based prediction models: an application to an aortic regurgitation study
1 Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, Riyadh 11211, Saudi Arabia
2 Department of Mathematics, University of Northern British Columbia, Prince George, BC, Canada
3 Department of Epidemiology and Biostatistics, Schulich School of Medicine, University of Western Ontario, London, Ontario, Canada
BMC Medical Research Methodology 2007, 7:21 doi:10.1186/1471-2288-7-21Published: 16 June 2007
An important issue in prediction modeling of multivariate data is the measure of dependence structure. The use of Pearson's correlation as a dependence measure has several pitfalls and hence application of regression prediction models based on this correlation may not be an appropriate methodology. As an alternative, a copula based methodology for prediction modeling and an algorithm to simulate data are proposed.
The method consists of introducing copulas as an alternative to the correlation coefficient commonly used as a measure of dependence. An algorithm based on the marginal distributions of random variables is applied to construct the Archimedean copulas. Monte Carlo simulations are carried out to replicate datasets, estimate prediction model parameters and validate them using Lin's concordance measure.
We have carried out a correlation-based regression analysis on data from 20 patients aged 17–82 years on pre-operative and post-operative ejection fractions after surgery and estimated the prediction model: Post-operative ejection fraction = - 0.0658 + 0.8403 (Pre-operative ejection fraction); p = 0.0008; 95% confidence interval of the slope coefficient (0.3998, 1.2808). From the exploratory data analysis, it is noted that both the pre-operative and post-operative ejection fractions measurements have slight departures from symmetry and are skewed to the left. It is also noted that the measurements tend to be widely spread and have shorter tails compared to normal distribution. Therefore predictions made from the correlation-based model corresponding to the pre-operative ejection fraction measurements in the lower range may not be accurate. Further it is found that the best approximated marginal distributions of pre-operative and post-operative ejection fractions (using q-q plots) are gamma distributions. The copula based prediction model is estimated as: Post -operative ejection fraction = - 0.0933 + 0.8907 × (Pre-operative ejection fraction); p = 0.00008 ; 95% confidence interval for slope coefficient (0.4810, 1.3003). For both models differences in the predicted post-operative ejection fractions in the lower range of pre-operative ejection measurements are considerably different and prediction errors due to copula model are smaller. To validate the copula methodology we have re-sampled with replacement fifty independent bootstrap samples and have estimated concordance statistics 0.7722 (p = 0.0224) for the copula model and 0.7237 (p = 0.0604) for the correlation model. The predicted and observed measurements are concordant for both models. The estimates of accuracy components are 0.9233 and 0.8654 for copula and correlation models respectively.
Copula-based prediction modeling is demonstrated to be an appropriate alternative to the conventional correlation-based prediction modeling since the correlation-based prediction models are not appropriate to model the dependence in populations with asymmetrical tails. Proposed copula-based prediction model has been validated using the independent bootstrap samples.