Testing the treatment effect on competing causes of death in oncology clinical trials
1 Gustave Roussy, Service de Biostatistique et d’Épidémiologie, F-94805 Villejuif, France
2 Univérsité Paris-Sud, F-94805 Villejuif, France
BMC Medical Research Methodology 2014, 14:72 doi:10.1186/1471-2288-14-72Published: 29 May 2014
Chemotherapy is expected to reduce cancer deaths (CD), while possibly being harmful in terms of non-cancer deaths (NCD) because of toxicity. Peto’s log-rank test is popular in the medical literature, but its operating characteristics are barely known. We compared this test to the most common ones in the statistical literature: the cause-specific hazard test and Gray’s test on the hazard of the subdistribution. We investigated for the first time the impact of reclassifications of causes of death (CoD) after recurrences, and of misclassification of CoD.
We present a simulation study in which we varied the censoring rate and the correlation between CD and NCD times, we generated recurrence times to study the role of the reclassification of CoD, and we added 20% misclassified CoD. We considered four scenarios for the treatment effect: none; none for CD and negative for NCD; positive for CD and none for NCD; positive for CD and negative for NCD. We applied the three tests to a randomized clinical trial evaluating adjuvant chemotherapy in 1,867 patients with non-small-cell lung cancer.
Most often the three tests well preserved their nominal size, Gray’s test did not when the treatment had an effect on the competing CoD. With a high rate of misclassified CoD, Gray’s and the cause-specific tests lost much of their power, whereas the Peto’s test had the highest power. The cause-specific test had inflated size for NCD when the treatment was beneficial for CD with many misclassified CoD, but had the highest power for NCD when the treatment had no effect on CD, and had similar power to Peto’s test for CD when the treatment had no effect on NCD. Gray’s test performed best when the effect on the two CoD was opposite. The higher the censoring, the lower the rejection probabilities of all the tests and the smaller their differences.
In this first head-to-head comparison of the three tests, the cause-specific test often proved to be the most reliable. Comparing results with and without misclassification of the CoD, Peto’s test was the least influenced by the presence of such misclassification.