Accounting for seasonal patterns in syndromic surveillance data for outbreak detection
1 Statistical Sciences, Mail Stop F600, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
2 Discrete Simulation Sciences, Mail Stop M997, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
BMC Medical Informatics and Decision Making 2006, 6:40 doi:10.1186/1472-6947-6-40Published: 4 December 2006
Syndromic surveillance (SS) can potentially contribute to outbreak detection capability by providing timely, novel data sources. One SS challenge is that some syndrome counts vary with season in a manner that is not identical from year to year.
Our goal is to evaluate the impact of inconsistent seasonal effects on performance assessments (false and true positive rates) in the context of detecting anomalous counts in data that exhibit seasonal variation.
To evaluate the impact of inconsistent seasonal effects, we injected synthetic outbreaks into real data and into data simulated from each of two models fit to the same real data. Using real respiratory syndrome counts collected in an emergency department from 2/1/94–5/31/03, we varied the length of training data from one to eight years, applied a sequential test to the forecast errors arising from each of eight forecasting methods, and evaluated their detection probabilities (DP) on the basis of 1000 injected synthetic outbreaks. We did the same for each of two corresponding simulated data sets. The less realistic, nonhierarchical model's simulated data set assumed that "one season fits all," meaning that each year's seasonal peak has the same onset, duration, and magnitude. The more realistic simulated data set used a hierarchical model to capture violation of the "one season fits all" assumption.
This experiment demonstrated optimistic bias in DP estimates for some of the methods when data simulated from the nonhierarchical model was used for DP estimation, thus suggesting that at least for some real data sets and methods, it is not adequate to assume that "one season fits all."
For the data we analyze, the "one season fits all " assumption is violated, and DP performance claims based on simulated data that assume "one season fits all," for the forecast methods considered, except for moving average methods, tend to be optimistic. Moving average methods based on relatively short amounts of training data are competitive on all three data sets, but are particularly competitive on the real data and on data from the hierarchical model, which are the two data sets that violate the "one season fits all" assumption.