Abstract
Background
Public health surveillance is the monitoring of data to detect and quantify unusual health events. Monitoring prediagnostic data, such as emergency department (ED) patient chief complaints, enables rapid detection of disease outbreaks. There are many sources of variation in such data; statistical methods need to accurately model them as a basis for timely and accurate disease outbreak methods.
Methods
Our new methods for modeling daily chief complaint counts are based on a seasonaltrend decomposition procedure based on loess (STL) and were developed using data from the 76 EDs of the Indiana surveillance program from 2004 to 2008. Square root counts are decomposed into interannual, yearlyseasonal, dayoftheweek, and randomerror components. Using this decomposition method, we develop a new synopticscale (days to weeks) outbreak detection method and carry out a simulation study to compare detection performance to four wellknown methods for nine outbreak scenarios.
Result
The components of the STL decomposition reveal insights into the variability of the Indiana ED data. Dayoftheweek components tend to peak Sunday or Monday, fall steadily to a minimum Thursday or Friday, and then rise to the peak. Yearlyseasonal components show seasonal influenza, some with bimodal peaks.
Some interannual components increase slightly due to increasing patient populations. A new outbreak detection method based on the decomposition modeling performs well with 90 days or more of data. Control limits were set empirically so that all methods had a specificity of 97%. STL had the largest sensitivity in all nine outbreak scenarios. The STL method also exhibited a wellbehaved false positive rate when run on the data with no outbreaks injected.
Conclusion
The STL decomposition method for chief complaint counts leads to a rapid and accurate detection method for disease outbreaks, and requires only 90 days of historical data to be put into operation. The visualization tools that accompany the decomposition and outbreak methods provide much insight into patterns in the data, which is useful for surveillance operations.
Background
The development of statistical methods for accurately modeling disease count time series for the early detection of bioterrorism and pandemic events is a very active area of research in syndromic surveillance [115]. Early detection requires accurate modeling of the data.
A challenge to modeling is systematic components of time variation whose patterns have a level of predictability – interannual (longterm trend), dayoftheweek, and yearlyseasonal components [2,10]. The variation can be ascribed to causal factors: for the interannual component the factor can be an increase or decrease in the population of people from which patients come; for the yearlyseasonal component it is changing weather and human activities over the course of a year; and for the dayoftheweek component it is a changing propensity of patients to seek medical attention according to the day of the week. In addition to these components is a noise component: random errors not assignable to causal factors that often can be modeled as independent random variables.
The most effective approach to early outbreak detection is to account for the systematic components of variation and the noise component through a model, and then base a detection method on values of the systematic components and the statistical properties of the random errors. Methods that do not accurately model the components can suffer in performance. The model predicts the systematic recurring behavior of the systematic components so it can be discounted by the detection method, and specifies the statistical properties of the counts through the stochastic modeling of the noise component. The two dangers are lack of fit – patterns in the systematic components are missed – and overfitting – the fitted values are unnecessarily noisy. Both are harmful. Lack of fit results in increased prediction squared bias, and overfitting results in increased prediction variance. The prediction meansquared error is the sum of squared bias and variance.
Many modeling methods require large amounts of historical data. Examples are the extended baseline methods [1115] of the Early Aberration Reporting System (EARS) [4], which require at least 3 years of data. Another popular example is a seasonal autoregressive integrated moving average (ARIMA) model in [3] that is fitted to 8 years of data. Such methods are not useful for many surveillance systems that have recently come online, so methods have been proposed that require little historical data [16]. They typically employ moving averages of recent data, such as C1MILD (C1), C2MEDIUM (C2), and C3ULTRA (C3), which are used in EARS. But such methods do not provide optimal performance because they do not exploit the full information in the data; they smooth out some component effects and operate locally to avoid others, rather than accounting for the components.
Our research goal is accurate modeling of components of variation, but without requiring a large amount of historical data. The research was carried as part of our analysis of the daily counts of chief complaints from emergency departments (EDs) of the Indiana Public Health Emergency Surveillance System (PHESS) [17]. The complaints are divided into eight classifications by CoCo [18]. Data for the first EDs go back to November 2004, and new EDs have come online continually since then. There are now 76 EDs in the system.
There are many ways to proceed in modeling chiefcomplaint time series. One is to develop parametric models. However, our new modeling methods are based on a nonparametric method, seasonaltrend decomposition using loess (STL) [19], which is very flexible and can account for a much wider range of component patterns than any single parametric model. Just as importantly, it can be used with as little as 90 days of data. We have also developed a new synopticscale (days to weeks) outbreak detection method based on the STL modeling.
Methods
Data
The STL decomposition was run on all Indiana EDs for respiratory and gastrointestinal counts. Both are fundamental markers for a number of naturally occurring diseases, and research has shown that diseases from bioweapons have early characterization of influenzalike illness [20], which typically results in respiratory complaints.
We present results for the respiratory time series for the 30 EDs that came online the earliest; these series end in April 2008, and start at times ranging from November 2004 to September 2005. All analyses were performed using the R statistical software environment [21], and an R package is available as a supplemental download [see additional file 1].
Additional file 1. R code and documentation. contains R source code, examples, data, and documentation for carrying out the STL procedure.
Format: ZIP Size: 83KB Download file
Modeling
STL decomposes a time series into components of variation [19] using a localregression modeling approach, loess [22]. Our methods use STL to decompose the square root of ED daily counts into four components:
where t is time in units of years and increments daily, Y_{t }is the respiratory count on day t, T_{t }is an interannual component that models longterm trend, S_{t }is a yearlyseasonal component, D_{t }is a dayoftheweek component, and N_{t }is a randomerror noise component.
Figure 1 shows the STL decomposition of for one ED. T_{t }is nearly constant. S_{t }has peaks due to seasonal influenza: single peaks in early 2005, 2007, and 2008, and a double peak in late 2006 and early 2007. While some of these yearlyseasonal effects are visible in the raw data, the effects are much more effectively seen in S_{t }because other effects including the noise are not present. Variation in D_{t }is small compared with the total variation of , but it cannot be ignored in the modeling because its variation is not small compared with the initial growth of critical disease outbreaks. N_{t }is irregular in behavior, and can be accurately modeled as independent, identically distributed normal random variables with mean 0. The square root transformation results in much simpler statistical properties. For example, the standard deviations of the N_{t }are nearly constant within and between hospitals, and the marginal distribution is nearly normal with mean 0 and variance . Neither are true for the untransformed counts. The behavior of the N_{t }after transformation is not surprising because the raw counts can be thought of as Poisson random variables, and the square root of a Poisson random variable whose mean is not too small is approximately normal with a standard deviation of 0.5 [23].
Figure 1. STL decomposition for respiratory square root daily counts. Respiratory square root daily counts and four components of variation of the STL decomposition for an Indiana emergency department (ED). The four components sum to the square root counts. The solid vertical lines show January 1 and the dashed vertical lines show April 1, July 1, and October 1.
STL components of variation arise from smoothing the data using moving weightedleastsquares polynomial fitting with a moving window bandwidth in days. The degree of the polynomial is 0 (locally constant), 1 (locally linear), or 2 (locally quadratic).
The interannual component, T_{t}, arises from locally linear fitting with a bandwidth of 1000 days, a very lowfrequency component. (Window bandwidths can be formally larger than the number of available data points.)
For the yearlyseasonal component, S_{t}, a critical idea of our method is not pooling values across years with the same time of year – values of January 1, values of January 2, etc. – as is often done. STL methodology can provide pooling, but because our methods are designed to work with limited data, S_{t }at a time t uses data in a neighborhood of t, and not across years. S_{t }is a lowfrequency component with a bandwidth of 90 days and with a blending of locally quadratic and locally constant fitting. It is possible that this local method would be better in many surveillance applications even with substantially more data if the phase and shape are sufficiently variable from one year to the next. For example, for some Indiana EDs, seasonal influenza peaks are unimodal in some years and bimodal in others. Our modeling tracks this accurately, but averaging across years could easily distort the patterns, for example, making the bimodal peaks merge into unimodal ones. Other research has also found that the "one season fits all" assumption, leading to pooling across years, is not suitable for disease surveillance data [10].
The blending for the S_{t }results in nearly stationary noise, N_{t}. Local smoothing methods tend to produce systematic components that have higher standard deviation as the time approaches the ends of the data, making the noise component have a lower standard deviation at the ends. We countered this with a new smoothing method, blending. S_{t }results from blending locally quadratic smoothing and locally constant smoothing, a weighted average of the two for the 50 days at each end of the data. The weight for the quadratic smoother is 0.7 at the first or last observation, and increases linearly moving away from each end, becoming 1 at the 50th observation from each end. This means the weight for the constant smoother goes from 0.3 to 0.
The fitting algorithm begins with computation of a dayoftheweek component, D_{t}, directly employing the method described in [19]. STL can allow D_{t }to slowly evolve though time but we found that D_{t }is a strictly periodic component. D_{t }results from an iterative process. First, a lowmiddlefrequency component is fitted using locally linear fitting with a bandwidth of 39 days. Then D_{t }is the result of means for each dayoftheweek of the minus the lowmiddlefrequency component. Then the current D_{t }is subtracted from the and the lowmiddlefrequency component is recomputed. This iterative process is continued to convergence. We subtract the final dayoftheweek component from the , and then use loess smoothing on the result to obtain the interannual component T_{t }as described above. Finally, we subtract the dayoftheweek and interannual components from the and smooth the result to obtain S_{t }as described above.
We selected the stable D_{t }assumption and the bandwidths of T_{t }and S_{t }through extensive use of visualization and numerical methods of model checking in order to prevent overfitting or a lack of fit of the components of variation, and to keep two components from competing for the same variation in the data.
Outbreak Detection
STL modeling provides public health officials with a clear view of yearlyseasonal variation for visual monitoring of the onset and magnitude of seasonal peaks. This is facilitated by the smoothness of S_{t }as opposed to the raw data.
Another critical task is the detection of outbreaks that rise up on the order of several days to several weeks and are not part of the systematic patterns in disease counts; the causes are bioterrorism and uncommon highly pernicious diseases. Following the terminology of meteorology, we will refer to this as a "synoptic timescale outbreak".
Our synopticscale detection method uses a simple control chart based on the assumption that the daily counts are independently distributed Poisson random variables with a changing mean λ_{t}. These assumptions are validated in the results section. The variability in the systematic components T_{t}, S_{t}, and D_{t }is taken to be negligible which is reasonable since they are relatively lowfrequency components. We take all variability to be in N_{t}. From Equation 1,
is estimated by the sample standard deviation of the N_{t}.
With Y_{t }as a Poisson random variable with with mean λ_{t}, our method declares an outbreak alarm when Y_{t }results in a value y_{t }for which P (Y_{t }≥ y_{t}; λ_{t}) is less than a threshold ρ. We evaluate this synopticscale outbreak method using the historical data for baselines and adding artificial outbreaks, as has been done in other research [24,25]. We use three baselines: respiratory daily counts from three EDs with low, medium, and high daily means – 9.25, 17.33, and 23.86.
The outbreak model is a lognormal epicurve of Sartwell [26]. A starting day is selected and becomes day 1. A total number of cases, O, is selected; the bigger the value of O, the more readily the outbreak can be detected. The day of each case is a random draw, rounded to the nearest positive integer, from a lognormal distribution whose minimum is 0.5. For our evaluation, with the lognormal density as
we use ζ = 2.401 and σ = 0.4626 to approximate temporal behavior of an anthrax outbreak [27]. The lognormal density times a constant is shown in Figure 2, which will be fully explained later.
Figure 2. Outbreak sample. Randomly generated outbreak of 76 cases injected according to the Sartwell model [26] is shown by the histogram. The lognormal density of the model multiplied by 76 is shown by the curve. The parameters of the lognormal are the mean, ζ = 2.401, and the standard deviation, σ = 0.4626, both on the natural log scale.
Detectability at a point in time depends on the the number of outbreak cases per day compared with the variability of the baseline counts over the outbreak period. As we have emphasized earlier, the variability of an ED count time series is larger when the level of the series is larger than when the level is smaller. Because O is fixed, detectability changes through time.
We used three values of O for each baseline, selected based on a method in [1]. There are many ways to make the selection, but we chose this one to remain consistent with past work. The maximum of our lognormal epicurve is 0.087 and occurs at = 8.9 days. If we suppose the density is this value for 8.9 ± 0.5 days, then the expected number of cases in this peak interval is 0.087O. Detectability can be controlled by making the expected peakinterval cases equal to a magnitude factor f times the sample standard deviation of the residual counts Y_{t } (T_{t }+ S_{t }+ D_{t})^{2}. The parameter f controls the detectability.
The three values of O in our evaluation for each baseline were chosen by this method with f = 1, 1.5, and 2. For example, for the baseline with the smallest mean daily counts, the standard deviation of the residual counts is 3.324. For f = 2, we have 2 × 3.324/0.087 = 76.414, and rounding to the nearest integer, O = 76. This case is shown in Figure 2. The curve is the lognormal density times 76, and the histogram shows 76 draws from the lognormal.
We compared this synopticscale method to other methods. Because some of our surveillance EDs have a rather small number of observations, we compared to the EARS C1, C2, and C3 methods [4]. These methods are based on a simple control chart approach with the daily test statistic calculated as
where μ_{t }and σ_{t }are a 7day baseline mean and standard deviation. For the C1 method, the baseline window for μ_{t }and σ_{t }is days t  7,..., t  1 and for the C2 method, the baseline window is days t  9,..., t  3. The test statistic for the C3 method is S_{t }+ S_{t1 }+ S_{t2}, where S_{t1 }and S_{t2 }are added if they do not exceed the threshold.
Since some EDs have more than 3 years of data, we also considered some of the EARS longer historical baseline methods. However, we chose not to implement any of them since it has been shown that these do not offer much of an advantage over the limited baseline methods [16]. Instead, we compared with a Poisson generalized linear model (GLM) method [24], which has terms for dayoftheweek, month, linear trend, and holidays. We did not include holidays since we were not able to find the details of its implementation. We also compared with a seasonal ARIMA model [3] but found its results unsatisfactory, so we do not include it here. This could be due to insufficient data for pooling across years, or that such pooling through the seasonal terms of the model cannot accommodate the substantial change from one year to the next in the seasonal patterns.
Outbreak detection was tested using 1004 days of each baseline. The first 365 days acted as historical data for the GLM method to have sufficient data to estimate coefficients. For each of the 9 combinations of outbreak detectability and baseline, we simulated 625 different outbreaks; the first started on day 366, the second on day 367, and so forth to the final on day 990.
Each outbreak was a different random sample from the lognormal epicurve. One sample is shown in Figure 2. The counts of each outbreak were added to the baseline counts to form Y_{t}. We ran each method through each day of an outbreak as would be done in real practice, and tracked whether the outbreak was detected and, if so, how long it took to detect. Time until detection is measured by the number of days after first exposure until the day of detection. Any evaluation that did not result in detection on or before day 14 of the outbreak was classified as notdetected.
We also investigated the minimum amount of data needed by the STL method to achieve good performance. Because T_{t }is very stable, and the window for S_{t }is 90 days, we would expect STL to do well for 90 days or more. To test this, we ran our methods at each time point from 366 to 990 using just the most recent 90 days of baseline data for each outbreak scenario.
We ran all methods at a false positive rate of 0.03, achieved empirically by choosing a cutoff, ρ, for each method from the historical data with no outbreaks injected. For the C1, C2, and C3 methods, this means choosing ρ to be the 97th percentile of the daily test statistic values. Since the STL and GLM methods update past fitted values as they progress, choosing ρ for these methods cannot be done by simply fitting the entire time series and retrospectively choosing ρ. Instead, we use the fitted values obtained at the last day of each fit over time to chose ρ.
In practice, limits for the STL and GLM methods would be set by using the theoretical false positive rate. If a model fits the data, the observed rates should be consistent with the theoretical rate. We compared the GLM and STL methods by studying their observed rates for the 30 EDs using ρ = 0.03.
Results
STL Modeling
The four components of variation – D_{t}, T_{t}, S_{t}, and N_{t }– of the STL modeling of the square root respiratory counts, , for the 30 EDs are presented in Figures 3, 4, 5, and 6. To maintain anonymity of the hospitals, names have been replaced by three letter words, and each T_{t }has been altered by subtracting its mean.
Figure 3. Dayoftheweek component. Dayoftheweek component, D_{t}, for square root respiratory counts for 30 Indiana EDs. The general pattern is Ushaped with a Monday or Sunday maximum and a Thursday or Friday minimum.
Figure 4. Yearlyseasonal component. Yearlyseasonal component, S_{t}, for square root respiratory daily counts for 30 Indiana EDs. Overall, patterns are similar, but detailed behavior varies with unimodal and bimodal peaks, different times of onset of yearlyseasonal disease, and different times of disease peaks.
Figure 5. Interannual component. Interannual component, T_{t}, for respiratory square root counts for 30 Indiana EDs. Each component has been centered around zero to protect anonymity. The longterm trend for each ED is either nearly constant or has a small increase due to a growing patient population.
Figure 6. Normal probability plots for N_{t}. Normal quantile plots for the noise component, N_{t}, of the square root respiratory counts for 30 EDs. The sample distribution of the noise is well approximated by the normal distribution. This is a result of the square root transformation of the counts.
The dayoftheweek components, D_{t}, in Figure 3 tend to peak on Sunday or Monday, fall to a minimum Thursday or Friday, and then rise. For some EDs, D_{t }is very high over the weekend but for others is considerably lower.
Figure 4 shows the yearlyseasonal component, S_{t}. The flu season peaks for 2004–2005 and 2007–2008 tend to be larger than those for 2005–2006 and 2006–2007. Many EDs show a double peak for 2006–2007.
Some T_{t }in Figure 5 show an increase and others are nearly constant. We attribute the increase to a growing population using an ED, and no growth to a stable population. Our conclusion is in part based on observing a similar pattern for gastrointestinal counts.
Diagnostic plots validated the STL modeling. One example, Figure 6, shows normal quantile plots of the N_{t}. The lines on the plots are drawn through the lower and upper quartile points. Because the data of the panels are close to the lines, the normal is a good approximation of the distribution. Plots of the autocorrelation functions of the N_{t }showed the noise component can be modeled as independent random variables. Loess smoothing of the N_{t }against t verified that no variation that should have been in T_{t }or S_{t }leaked into the N_{t}. Separate loess plots of the N_{t }for each dayoftheweek verified that all significant dayoftheweek effects were captured by D_{t}.
Modeling on the Counts Scale
The three systematic components – T_{t}, S_{t}, and D_{t }– are each weighted averages of a large number of observations, so the values are are comparatively stable. The statistical variability in is chiefly the result of N_{t}, modeled as independent, identically distributed normal random variables with variance . Thus the sample mean of Y_{t }is λ_{t }from Equation 2.
The square root of a Poisson variable whose mean is not too small is approximately normal with a standard deviation of 0.5 [23]. The sample standard deviations of the 30 noise components are all greater than 0.5, indicating a systematic departure, but are not far from 0.5. The upper quartile of the estimates is 0.58 and the median is 0.54; two estimates reach as high as 0.63 but this is due to a few outliers. The consistency of the values can be seen in Figure 6; the slopes of the lines on the display are nearly the same.
The near normality of N_{t }and the closeness of the to 0.5 are consistent with the Y_{t }having a distribution that is well approximated by the Poisson with mean λ_{t}. However, in carrying out the detection method, we use in place of the theoretical value 0.5 to provide a measure of robustness to the Poisson model.
Outbreak Detection
Table 1 reports the observed sensitivity and average number of days until detection for each baseline, outbreak magnitude, and method. Figure 7 graphs the sensitivity versus magnitude. The days until detection are recorded including the incubation period, and referring to Figure 2, outbreak cases usually do not begin until day 3 or 4. Detection before day 6 or day 7 is before the expected outbreak peak. The C1 method typically has the earliest detection, but the sensitivity is poor. The GLM and STL methods have the best sensitivity, with the STL method clearly ahead in all cases. The STL method detects outbreaks somewhat faster than the GLM method.
Table 1. Outbreak detection simulation results.
Figure 7. Outbreak simulation results. Outbreak detection simulation results. The false positive rate was set empirically for each method and baseline to be 0.03. For each baseline, the STL method detects more than 10% more outbreaks than the other methods at the smallest magnitude.
At the lowest outbreak magnitude, the STL method outperforms all other methods by a difference of 10% sensitivity. At higher magnitudes, the gap closes, but this is not surprising because all reasonable methods should eventually converge to 100% detection as magnitude increases. Comparisons of sensitivity across baselines within each method and magnitude show quite a bit of variability. This may be due to unique characteristics of each ED, the random outbreak generation, or the fact that the cutoff values were set empirically within each baseline.
The results of running the STL method on the same date range, but using only 90 days of baseline data for each scenario, are shown in final column of Table 1. These numbers are very comparable to the results using all available data. This indicates that the STL detection method is effective when 90 days of data are available.
Figure 8 shows quantile plots of the observed false positive rates from the GLM and STL methods for the historical data with no outbreaks and a theoretical false positive rate of 0.03 for all 30 EDs. The observed rates for the GLM methods deviate from 0.03 by far more than for STL.
Figure 8. Observed false positive rates for STL and GLM. Quantile plots of observed false positive rates for the STL and GLM methods based on a theoretical false positive rate of 0.03, from respiratory counts for each of the 30 EDs. The dashed lines represent the median value for each method.
To better understand the difference in performance across outbreak methods, we investigated the model fitted values (predicted systematic values) and residuals (data minus the fits) from each method. As we have emphasized, the methods of statistical modeling upon which each detection method is based have a large impact on performance. For the EARS methods, the fitted value for day t is the sevenday baseline raw data mean μ_{t }from Equation 4. For the GLM and the STL methods, the fitted value for day t is the evaluation of the systematic components at time t for a fit through time t, just as it would be used in practice for outbreak detection. GLM bases this on data for a number of years; STL bases it on 90 days to a good approximation unless there is substantial longterm trend, which there is not in our Indiana surveillance data.
Figures 9 and 10 are two residual diagnostic plots to check for lack of fit in the EARS, GLM, and STL methods for the daily respiratory counts for one ED. Figure 9 graphs the residuals against t for each model; a loess curve shows the local average fluctuations of the residuals around 0. Figure 10 graphs the residuals against week for each dayofweek, also with loess curves. Figure 9 shows substantial lack of fit for GLM, systematic fluctuations from 0. This is due to pooling across years to estimate the yearly seasonal component, which can be quite different from one year to the next. Lack of fit revealed for EARS and STL is minor. Figure 10 shows systematic lack of fit by day of the week for EARS, but not STL and GLM. The reason is simply that EARS does not model day of the week variation.
Figure 9. Residuals for model fits. Residuals for model fits to daily respiratory counts for one ED. The EARS residuals are the observed count minus the 7 day baseline mean with lag 2. The GLM and STL residuals are obtained from the model predicted values. The smooth curve is the local mean of the residuals.
Figure 10. Residuals by dayoftheweek. Residuals for model fits to daily respiratory counts for one ED, by dayoftheweek. The smooth curve is the local mean of the residuals.
Figure 11 checks the variability in the EARS, GLM, and STL fitted values for the same ED of the previous two figures. We want the fitted values to be as smooth as possible without introducing lack of fit. For EARS, the fitted values are graphed against t. To make comparisons of EARS with GLM and STL commensurate, we graph the GLM and STL fitted values minus their dayofweek components against t. The EARS fit is much noisier than that of STL, the result of overfitting. In particular there is a large amount of synoptic scale variation, making it more difficult to detect synopticscale outbreaks. GLM is locally quite smooth at most points in time, but has large precipitous jumps at certain time points because the yearly seasonal component is modeled as a step function, constant within months, but changing discontinuously from one month to the next.
Figure 11. Fitted components. Fitted components for daily respiratory counts for one ED. The EARS fitted value at day t is the the 7 day baseline mean with lag 2. The GLM and STL fitted values are the predicted values from fitting the models up to day t but with the dayoftheweek components removed to make comparison with the variability of the EARS fitted values commensurate.
Discussion
For our data, the same smoothing parameters were successfully used across all EDs. We advise those interested in applying these methods to their syndromic data to perform their own parameter selection and model validation. For outbreak detection, our data behaved like Poisson random variables, but this may not be the case in other applications. We advise checking this assumption and if it does not hold, investigating other possibilities such as the overdispersed Poisson. Another alternative is validating distributional properties of the remainder term and using this term for monitoring.
Although our detection performance results favor the STL method, this does not necessarily mean that it is the best method available. There are certainly other methods with which we could have compared. We chose the EARS methods since they are widely used and can be applied to very short disease time series. Our detection study was limited to one type of outbreak. Further use and study of this method will determine its merit. The visualization capabilities of STL modeling make it a useful tool for visual analytics.
The STL method should not be used for small counts close to and including zero. While they work well with the large counts of the respiratory and gastrointestinal categories, many other categories such as botulinic have counts that are too small for the square roots to be approximately normally distributed. Future work can investigate employing STL ideas for small counts by replacing the square root normal distribution and local leastsquares fitting with the Poisson distribution and local Poisson maximum likelihood fitting.
Conclusion
The STL decomposition methods presented here effectively model chief complaint counts for syndromic surveillance without significant lack of fit or undue noise, and lead to a synoptic timescale disease outbreak detection method that in our testing scenarios performs better than other methods to which is it compared – EARS C1, C2, and C3 methods [4], as well as a Poisson GLM modeling method [24]. The methods can be used for disease time series as short as 90 days, which is important because many surveillance systems have started only recently and have a limited number of observations. Visualization of the components of variation – interannual, yearlyseasonal, dayoftheweek, and randomerror – provides much insight into the properties of disease time series. We recommend the methods be considered by those who manage public health surveillance systems.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
DSE, WSC, and SJG designed the research project, obtained funding, and recruited other members. RPH, DA, and WSC developed and implemented the statistical methods and wrote the first draft of the manuscript. RPH developed the disease outbreak detection method and carried out the simulations. RPH, DA, WSC, and RM conducted the literature review. AA, MY, MO, and SJG designed the data collection, the database schema, and the form of the continuous data feed. All authors participated in the review of the manuscript and approved it.
Acknowledgements
This work has been funded by the Regional Visualization and Analytics Center (RVAC) program of the U.S. Department of Homeland Security.
References

Burkom H: Development, adaptation, and assessment of alerting algorithms for biosurveillance.

Burkom H, Murphy S, Shmueli G: Automated time series forecasting for biosurveillance.
Stat in Med 2007, 26(22):42024218. Publisher Full Text

Reis B, Mandl K: Time series modeling for syndromic surveillance.
BMC Med Inform Decis Mak 2003, 3:2. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hutwagner L, Thompson W, Seeman G, Treadwell T: The bioterrorism preparedness and response Early Aberration Reporting System (EARS).
J Urban Health 2003, 80:i89i96. PubMed Abstract  Publisher Full Text

Wallenstein S, Naus J: Scan statistics for temporal surveillance for biologic terrorism.
MMWR Morb Mortal Wkly Rep 2004, 53 Suppl:7478. PubMed Abstract  Publisher Full Text

Brillman J, Burr T, Forslund D, Joyce E, Picard R, Umland E: Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance.
BMC Med Inform Decis Mak 2005, 5(4):114. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dafni U, Tsiodras S, Panagiotakos D, Gkolfinopoulou K, Kouvatseas G, Tsourti Z, Saroglou G: Algorithm for statistical detection of peaks – syndromic surveillance system for the Athens 2004 olympic games.
MMWR Morb Mortal Wkly Rep 2004, 53:8694. PubMed Abstract  Publisher Full Text

Kleinman K, Abrams A, Kulldorff M, Platt R: A modeladjusted spacetime scan statistic with an application to syndromic surveillance.
Epidemiology and Infection 2005, 133(03):409419. Publisher Full Text

Wong W, Moore A, Cooper G, Wagner M: Rulebased anomaly pattern detection for detecting disease outbreaks. In Proc. 18th Nat. Conf. on Artificial Intelligence. Volume 2002. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999::217223.

Burr T, Graves T, Klamann R, Michalak S, Picard R, Hengartner N: Accounting for seasonal patterns in syndromic surveillance data for outbreak detection.
BMC Med Inform Decis Mak 2006, 6:40. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Stroup D, Wharton M, Kafadar K, Dean A: Evaluation of a method for detecting aberrations in public health surveillance data.
Am J Epidemiol 1993, 137(3):373380. PubMed Abstract  Publisher Full Text

Hutwagner L, Maloney E, Bean N, Slutsker L, Martin S: Using laboratorybased surveillance data for prevention: an algorithm for detecting salmonella outbreaks.
Emerg Infect Dis 1997, 3:395400. PubMed Abstract

Farrington C, Andrews N, Beale A, Catchpole M: A statistical algorithm for the early detection of outbreaks of infectious disease.
J R Slat Soc Ser A Stat Soc 1996, 159(3):547563. Publisher Full Text

Stern L, Lightfoot D: Automated outbreak detection: a quantitative retrospective analysis.
Epidemiol Infect 1999, 122(01):103110. PubMed Abstract  Publisher Full Text

Simonsen L: The impact of influenza epidemics on mortality: introducing a severity index.
American Journal of Public Health 1997, 87(12):19441950. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hutwagner L, Browne T, Seeman G, Fleischauer A: Comparing aberration detection methods with simulated data.
Emerg Infect Dis 2005, 11(2):314316. PubMed Abstract  Publisher Full Text

Grannis S, Wade M, Gibson J, Overhage J: The Indiana public health emergency surveillance system: Ongoing progress, early findings, and future directions.
AMIA Annu Symp Proc 2006, 304308. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Olszewski R: Bayesian classification of triage diagnoses for the early detection of epidemics. In Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference. Menlo Park, CA: AAAI; 2003:4127.

Cleveland R, Cleveland W, McRae J, Terpenning I: STL: A seasonaltrend decomposition procedure based on loess (with discussion).

Franz D, Jahrling P, Friedlander A, McClain D, Hoover D, Bryne W, Pavlin J, Christopher G, Eitzen E: Clinical recognition and management of patients exposed to biological warfare agents.
JAMA 1997, 278(5):399411. PubMed Abstract  Publisher Full Text

R Development Core Team: [http://www.Rproject.org] webcite
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2005.

Cleveland W, Devlin S: Locally weighted regression: an approach to regression analysis by local fitting.

Johnson NL, Kotz S, Kemp A: Univariate Discrete Distributions. New York: Wiley; 1992.

Jackson M, Baer A, Painter I, Duchin J: A simulation study comparing aberration detection algorithms for syndromic surveillance.
BMC Med Inform Decis Mak 2007, 7:66. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Mandl K, Reis B, Cassa C: Measuring outbreakdetection performance by using controlled feature set simulations.
MMWR Morb Mortal Wkly Rep 2004, 53:1306. PubMed Abstract  Publisher Full Text

Sartwell P: The distribution of incubation periods of infectious disease.
Am J Hyg 1950, 51(3):310318. PubMed Abstract  Publisher Full Text

Meselson M, Guillemin J, HughJones M, Langmuir A, Popova I, Shelokov A, Yampolskaya O: The Sverdlovsk anthrax outbreak of 1979.
Science 1994, 266(5188):12021208. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: