Laboratory of Clinical Neurophysiology, Scientific Institute (IRCCS) S. Maria Nascente, don C. Gnocchi Foundation, Via Capecelatro 66, Milan, 20148, Italy
Multiple Sclerosis Rehabilitation Unit, Scientific Institute (IRCCS) S. Maria Nascente, don C. Gnocchi Foundation, Via Capecelatro 66, Milan, 20148, Italy
Neurological Rehabilitation Unit, Scientific Institute (IRCCS) S. Maria Nascente, don C. Gnocchi Foundation, Via Capecelatro 66, Milan, 20148, Italy
Department of Agriculture Food and Environmental Economics, University of Milan, Via G. Celoria, Milan, 220133, Italy
Abstract
Background
The prognostic value of evoked potentials (EPs) in multiple sclerosis (MS) has not been fully established. The correlations between the Expanded Disability Status Scale (EDSS) at First Neurological Evaluation (FNE) and the duration of the disease, as well as between EDSS and EPs, have influenced the outcome of most previous studies. To overcome this confounding relations, we propose to test the prognostic value of EPs within an appropriate patient population which should be based on patients with low EDSS at FNE and short disease duration.
Methods
We retrospectively selected a sample of 143 early relapsing remitting MS (RRMS) patients with an EDSS < 3.5 from a larger database spanning 20 years. By means of bivariate logistic regressions, the best predictors of worsening were selected among several demographic and clinical variables. The best multivariate logistic model was statistically validated and prospectively applied to 50 patients examined during 2009–2011.
Results
The Evoked Potentials score (EP score) and the Time to EDSS 2.0 (TT2) were the best predictors of worsening in our sample (Odds Ratio 1.10 and 0.82 respectively, p=0.001). Low EP score (below 15–20 points), short TT2 (lower than 3–5 years) and their interaction resulted to be the most useful for the identification of worsening patterns. Moreover, in patients with an EP score at FNE below 6 points and a TT2 greater than 3 years the probability of worsening was 10% after 4–5 years and rapidly decreased thereafter.
Conclusions
In an appropriate population of early RRMS patients, the EP score at FNE is a good predictor of disability at low values as well as in combination with a rapid buildup of disability. Interestingly, an EP score at FNE under the median together with a clinical stability lasting more than 3 years turned out to be a protective pattern. This finding may contribute to an early identification of benign patients, well before the term required to diagnose Benign MS (BMS).
Background
In the neuroimaging era the role of evoked potentials as diagnostic tools has been greatly diminished. This led many authors to explore how Evoked Potentials (EPs) could still be useful as predictors of clinical disability in multiple sclerosis (MS)
Our aim was to evaluate the predictive value of the EP score first by determining how an appropriate patient population should be defined, and then by assessing the performance of the EP score in a multivariate logistic regression analysis. As the EP score summarizes the quantitative information of different EP modalities, it is an optimal tool to evaluate the overall subclinical impairment of MS patients. By analyzing its performance within an appropriate patient population it should be possible to better evaluate the unbiased ability of this score to predict worsening in MS. In particular, the identification of a group with a low risk of worsening may contribute to an early identification of putative benign MS patients
Methods
Patients
A total of 143 MS patients, who were referred to our centre (Scientific Institute S. Maria Nascente, don Gnocchi Foundation, Milan, Italy) during the period 1989–2009 for clinical, neuroimaging and neurophysiological assessments, were retrospectively selected from our clinical database. Inclusion criteria for this study were: (1) a diagnosis of Relapsing Remitting MS (RRMS) using Poser
At FNE, 10 patients (7%) were receiving immunomodulatory treatments and 16 (11%) immunosuppressive treatment; however, patients with and without treatments did not differ in terms of EDSS, EP score and disease duration at FNE.
Patients with incomplete EP tests or missing EDSS values, as well as patients with EDSS ≥ 3.5 at the FNE were not included. EDSS was considered only if assessed during periods of clinical stability.
This study has been approved by the ethics committee of the Scientific Institute S.Maria Nascente of Milan and has been performed in accordance with the ethical standards of the declaration of Helsinki.
The EP score
VEP, BAER ,SEPLL and SEPUL were recorded according to recommended standardized protocols
Statistics
First, the associations of the outcome variable L_EDSS with EP score, F_EDSS and TT2, were analyzed by the Spearman’s rank correlation coefficient. In addition, the associations between L_EDSS and other clinical characteristics at FNE were examined. Receiver operating characteristic curves (ROC) for the clinical variables (EP score, EDSS, TT2) were used to predict whether clinical worsening, defined as crossing the threshold of EDSS 3.5, occurred before 2008–2009. The area under the curve (AUC) and the 95% confidence intervals (CI) for each variable were also considered. In line with C. Renoux
A change of 1.0 EDSS point is another measure of worsening proposed in literature
Second, we sought to ascertain whether a combination of EP score and other clinical variables could improve the prediction of clinical worsening by using a multivariate logistic regression analysis. To identify other potential predictors of clinical worsening we performed a backward selection starting with a model that contained all demographical (age at FNE, gender, disease duration) and clinical variables (F_EDSS,F_EP score, TT2) with p< 0.2 on bivariate logistic regressions. In the resulting multivariate logistic regression model, variables with pvalues ≥ 0.2 were eliminated leaving the F_EP score, F_EDSS and TT2 as the best predictors of disability. Given the multicollinearity between F_EDSS and TT2, which cast doubt on their independence, two models were tested: (model 1) with F_EP score and TT2 and (model 2) with F_EP score and F_EDSS. The final choice of Model 1 (hereinafter referred to as “the model”) was based on likelihood ratio chisquares, Akaike’s information criteria (AIC) and Bayesian information criteria (BIC). ROC curves were used again to assess the best cut off point of the model in terms of sensitivity and specificity, and the resulting AUC was compared with those of the previous bivariate ROC analyses. The distribution of predicted probabilities for each variable was evaluated by plotting each regressor against the probability of reaching EDSS 3.5. To better assess how final variables might interact, the predicted probabilities were also analyzed by dividing the whole sample by the median value of the variable TT2. The EP scores of the resulting two subgroups (i.e. patients below and above the median of TT2) were plotted against the probability of reaching EDSS 3.5 to assess if the predicted probability curves had a similar shape (i.e. no effect of TT2 subgroups over the EP score) or if differences occurred between subgroups. The same procedure was used after dividing the whole sample by the median value of the EP score.
Third, the model was validated by using a nonparametric bootstrap analysis and it was prospectively applied to a group of 50 patients (11 follow up patients and 39 newly selected cases) examined during 2009–2011 to show its practical utility.
Results
The demographical and clinical characteristics of the 143 MS patients are shown in Table
Variable
Mean±SD
Median (IQR)
Age at FNE
31.9 ± 8.7
31 (25–38)
Time from first symptom to FNE
4.5 ± 4.6yrs
3 (1–7)yrs
Time from FNE to 2009
10.5 ± 4.6yrs
11(6.514)yrs
F_EP score
7.6 ± 8.5
5 (2–11)
F_EDSS (<3.5)
1.3 ±0.9
1.5 (1–2)
TT2
6.5 ± 7.7
3( 0–11)
L_EDSS
2.7 ± 1.7
2.5 (1.53.5)
Spearman correlation coefficients (ρ) between EP score and EDSS ranged from weak (0.27; p<0.001) at FNE to moderate (0.41; p<0.0001) at the time of the last assessment (L_EDSS). Likewise, TT2 was strongly correlated with F_EDSS (−0.73; p<0.0001) and moderately with L_EDSS (−0.55; p<0.0001). The correlation between F_EP score and TT2 was weak (−0.25; p<0.01) and none of the correlations between L_EDSS and the demographical variables were significant. To verify whether the variability of F_EDSS and of the time elapsed from the first symptom to FNE could have influenced the correlation between EP score and disability, we repeated the correlations stratifying cases by the median EDSS value (1.5) and by the interval between disease onset and the first neurophysiologic evaluation
Variable
At FNE
At last followup
ρ
ρ
(a) patient population divided for the median EDSS; (b) patient population stratified for the interval between onset and the first neurological evaluation (FNE); pvalues are corrected for multiple comparisons;
(a)F_EDSS
≤ 1.5(n=94,EP score 6.4±7.5,
Disease duration 3.7±4.1)
0.14
0.15
0.33
0.002
>1.5(n=49, EP score 10±9.8,
Disease duration 5.9±5.1)
0.39
0.0114
0.41
0.0064
(b)Interval between MS onset and FNE
< 2 yrs (n=48)
0.08
0.55
0.46
0.0027
26 yrs (n=55)
0.20
0.13
0.34
0.0297
> 6 yrs (n=40)
0.47
0.006
0.46
0.009
We found a statistically significant correlation between the L_EP score and L_EDSS in both subgroups, while at FNE only the subgroup with an EDSS > 1.5 was significantly correlated with F_EP score (Table
The predictive power of the F_EP score, F_EDSS and TT2 was analyzed by means of ROC curves: AUC for the F_EP score was 0.72 (95 % CI: 0.630.82), 0.71 for F_EDSS (95% CI: 0.610.81), and 0.74 for TT2 (95% CI: 0.660.82).
Our backward selection procedure for the multivariate logistic regression resulted in the following prediction model (Table
where
Parameter
βEstimate
Odds ratio
Standard Error
Intercept
−0.8987

0.3500
0.010
EP score
0.0955
1.10
0.0282
0.001
TT2
−0.1990
0.82
0.0587
0.001
All the diagnostics of the logistic regression model were run to check for possible errors of specification, multicollinearity and influential observations. Predictions showed a moderate correlation with the observed values (Spearman rank ρ = 0.495, p<0.0001). The model was tested against the corresponding bivariate models including either TT2 or F_EP score to evaluate how the addition of each variable contributed to the model fitting. The goodness of fit was significantly higher in both cases (likelihood ratio χ² = 22.69, p<0.0001 for the inclusion of TT2 and likelihood ratio χ² = 14.49, p=0.0001 for the inclusion of EP score); indeed the AUC was 0.8135 (95% CI 0.74 – 0.88), better than the AUCs of the EP score, F_EDSS and TT2 taken individually (p <0.02).
The sensitivity and specificity at different cutoff points for the prediction of clinical worsening are shown in Figure
ROC curve resulting from the logistic regression model
ROC curve resulting from the logistic regression model. The area under the curve (AUC = 0.81) shows the sensitivity and specificity corresponding to different cutoff points of the prediction of clinical worsening (defined by patients reaching the threshold of EDSS 3.5). The best cutoff point defined by the maximum of Youden’s index corresponds to a sensitivity of 0.738 and a specificity of 0.693.
The cutpoint for the predicted threshold of EDSS 3.5 was 0.31, corresponding to the highest sensitivity (0.738) and the highest specificity (0.693). The resulting index of accuracy (Youden) was 0.43.
The predicted probabilities of the model are plotted in Figure
Distribution of predicted probabilities for EP score and TT2
Distribution of predicted probabilities for EP score and TT2. A sampling plot drawn from the logistic regression model showing how the predicted probabilities (PP) of clinical worsening are distributed along the EP score scale (with TT2 held constant at the mean). A sample of EP values and their corresponding PP is reported below the graph. Shaded area represents 95% CI.
Distribution of predicted probabilities (PP) for EP score and TT2
Distribution of predicted probabilities (PP) for EP score and TT2. A sampling plot drawn from the logistic regression model, showing how the predicted probabilities of clinical worsening are distributed along the TT2 variable (xaxis = yrs; with the EP score held constant at the mean). A sample of TT2 values and their corresponding PP is reported below the graph. Shaded area represents 95% CI.
A more detailed picture was obtained by dividing the whole sample by the median value of each variable (3 years for TT2 and 5 points for EP score), as detailed in the Methods. Accordingly, Figure
Plot of predicted probabilities (Yaxis) vs. the EP score (Xaxis)
Plot of predicted probabilities (Yaxis) vs. the EP score (Xaxis). The dotted line represents patients with TT2 > 3 yrs; the solid line represents patients with TT2 ≤ 3 yrs. The difference between the curves is largest at the origin of the EP axis and tends to decrease as the EP values grow until it eventually becomes negligible at approximately an EP score of 20.
Plot of predicted probabilities (Yaxis) vs. TT2 (Xaxis)
Plot of predicted probabilities (Yaxis) vs. TT2 (Xaxis). The dotted line represents patients with an EP score ≤ 5; the solid line represents patients with an EP score > 5. The difference between the curves is largest at the origin of the TT2 axis and decreases as TT2 exceeds 4–5 years.
It is worth clarifying at this point that these results were obtained by arbitrarily ending the study in 2009. This means that theoretically the prediction could be extended up to our longest retrospective analysis (20 years, from 1989 to 2009); however, as the time from FNE to the end of the study was greatly variable among our patients, the actual prediction span is lower than 20 years (10 years on average, see Table
prospective study
F_EP score (med; range)
F_EDSS (med; range)
Pr.<0.4(n/ms)
Pr.0.40.6 (n/ms)
Pr.≥0.6(n/ms)
11 pts. = 11 patients drawn from the original patient population who returned for additional EDSS assessment during 2010–2011; 6 pts. = 6 patients with 2 examinations during 2009–2011; 33 pts.= 33 patients with only 1 EPs and EDSS assessment during 2009–20011.
5; 025
1.5; 03
6/0
3/1
2/0
1.5; 037
2.0; 03
4/0

2/0
6; 028
1.5; 03
26/.
4/.
3/.
Ten of the 11 patients who were reassessed once during the prospective followup were correctly classified by the model. The only misclassified case had an F_EP score (dating back to 1996) of 7 and a F_EDSS of 2.0 resulting in a probability of 0.44, which lies just below the model cut off and indicates a low probability of progression toward EDSS 3.5. Despite this prediction was still correct in 2009, this patient eventually progressed to EDSS 3.5 during 2011, i.e. 15 years after FNE.
All the patients with a probability exceeding 0.6 (n= 4/17) having a mean F_EP score of 25.7 and an F_EDSS over 2.0 reached the threshold in a variable time span ranging from 1 to 20 years. On the other hand, patients with a probability lower than 0.4 (n= 10/17) and a mean F_EP score of 1.4 did not reach the threshold within a 16 years period; five of them had not reached EDSS 2.0 at the time of L_EDSS assessment, while the others had a mean TT2 of 4.2 years.
Finally, a non parametric bootstrap analysis was carried out to validate the model using the Bias Corrected and Accelerated (BCA) method in order to estimate the 95% bootstrap confidence interval for each variable (EP score CI= 0.04 to 0.15 ; TT2 CI= −0.13 to −0.29). Given that the diagnostic criteria for bootstrap analysis were fully met (low differences between predicted and observed regressors coefficients and standard errors, Gaussian shape of the bootstrap distribution), we interpreted these results as consistent with a successful prospective validation of the model.
Discussion
The retrospective part of this study aimed to build up and evaluate a model combining neurophysiologic and clinical evaluations to obtain a reliable prediction of the progression of disability in MS patients with particular attention to the role of evoked potentials. A summary score considering both abnormalities of latencies as well as of morphology and of amplitude symmetry of the principal EP components
At FNE, our patients showed a correlation between EP score and EDSS which was lower compared to that reported by Invernizzi et al., Leocani et al., and Kallman et al.’s group 2
Correlations between EDSS and EP scores in the last 6 years literature
Correlations between EDSS and EP scores in the last 6 years literature. The correlations between EDSS and EP scores reflect the researchers’ choice of patients selection criteria. 1: Kallmann et al. 2006
Second, disease duration also impacts the degree of clinical disability and, consequently, the correlation between clinical and subclinical measures. This is clearly shown in Figure
Moreover, Hughes et al.
An early MS diagnosis, in addition to being preferred by MS patients
Consequently, a logistic regression model including the F_EP score as well as the TT2 variable was applied to a sample of 143 RRMS patients having a mean disease duration of 4.5 years and a mean F_EDSS of 1.3. The aim of the model was to predict the progression of disability defined as the risk of reaching the threshold of EDSS 3.5
On the other hand, high EP scores (over 20–25 points) or a long time to reach EDSS 2.0 (over 10–15 years) were not associated with very different probabilities of worsening among the subgroups obtained by dividing the whole sample by the median value of TT2 (Figure
The EP score and TT2 have the greatest utility when their values are able to show different patterns of worsening. By dividing our sample in 4 groups, namely (a) high F_EP score + short TT2 (b) high F_EP score + long TT2; (c) low F_EP score + short TT2 and (d) low F_EP score + long TT2, we showed that our model can identify separate patterns. Groups (a) and (b) have in common a high subclinical impairment and therefore are candidates to clinical worsening whatever the conversion time to clinical disability (as shown by overlapping solid and dotted lines approximately in the last 30 values of the xaxis in Figure
As recently underlined by Schlaeger et al.
In the prospective part of this work we evaluated the risk of progression to EDSS 3.5 by applying our model to data partially obtained during the period 2009–2011. The outcome was correctly predicted by the model in 16 of 17 patients who completed the two years followup; the subject who was misclassified received a prediction close to 0.5. To improve the usefulness of the model and reduce false negatives, we are paying special attention to patients with a predicted probability in the range between 0.4 and 0.6. Four of the 33 patients who were assessed only once during 2009–2011 fulfilled this requirement and are now being closely monitored.
Conclusions
In conclusion, we showed that a logistic regression model combining clinical and neurophysiologic data collected at FNE from early RRMS patients can be a reliable tool to identify patterns of prognosis in everyday clinical practice. Furthermore, we have been able to identify a pattern that could improve the definition of BMS using both EP score and EDSS progression. We have also discussed why a model centered on an appropriate patient population, i.e. RRMS with low disease duration and low F_EDSS, is to be preferred to models derived from samples with higher disease duration, higher F_EDSS and progressive MS courses which lead to more accurate but less practical predictions. We strongly believe that heuristic rather than esthetic results are to be pursued and that the real challenge arena for prediction models in MS is the early phase of the disease when divergence among clinical and neurophysiologic measures is still important, allowing the EP score to express its unbiased potentiality.
Abbreviations
EPs: Evoked potentials; MS: Multiple sclerosis; EDSS: Expanded disability status scale; FNE: First neurological evaluation; TT2: Time to EDSS 2.0; BMS: Benign multiple sclerosis; EP score: Evoked potentials score; F_EP score: EP score at FNE; L_EP score: Last EP score assessment; RRMS: Relapsing remitting multiple sclerosis; F_EDSS: EDSS at FNE; L_EDSS: Last EDSS assessment; VEP: Visual evoked potentials; BAER: Brainstem auditory evoked potentials; SEP: Somatosensory evoked potentials; UL: Upper limbs; LL: Lower limbs; CI: Confidence intervals; ROC: Receiver operating characteristic; AUC: Area under the ROC curve; AIC: Akaike's information criteria; BIC: Bayesian information criteria; BCA: Bias corrected and accelerated method; SE: Standard error; MRI: Magnetic resonance imaging; DSS: Disability status scale.
Competing interests
The authors declare that they have no conflict of interest.
Authors' contributions
NM conceived and performed computational analysis, statistical analysis, results analysis and participated in manuscript discussion, manuscript preparation, manuscript writing and manuscript review; LM carried out patients recruitment and clinical assessments, participated in results analysis and discussion, helped in manuscript preparation and manuscript review; MG participated in database preparation and analysis, and performed a review of neurophysiologic tests; EG carried out patient collection and performed neurophysiologic tests; EC helped in statistical analysis and manuscript review; RN participated in results analysis, discussion and manuscript review; LP conceived the study, participated in its design, coordinated neurophysiologic analysis, performed results analysis and discussion, helped in manuscript preparation and manuscript review. All authors read and approved the final manuscript.
Acknowledgements
This study was supported by the Italian Ministry of Health, Ricerca Corrente funding plan to the institutional research activity of the Scientific Institute S. Maria Nascente of the Don C. Gnocchi Foundation. The authors thank Dr. Silvia Dodaro who provided language editing.
Prepublication history
The prepublication history for this paper can be accessed here: