Department of Infectious Disease Surveillance and Control, National Institute for Health and Welfare, Helsinki, Finland

Abstract

Background

Verotoxigenic

Methods

We enrolled 131 domestically acquired primary cases of VTEC between 1997 and 2006 from routine surveillance data. The isolated strains were characterized by virulence type, serogroup, phage type and pulsed-field gel electrophoresis. By applying a two-part Bayesian hurdle model to infectious disease surveillance data, we were able to create a model in which the covariates were associated with the probability for occurrence of the cases in the logistic regression part and the magnitude of covariate changes in the Poisson regression part if cases do occur. The model also included spatial correlations between neighbouring municipalities.

Results

The average annual incidence rate was 4.8 cases per million inhabitants based on the cases as reported to the NIDR. Of the 131 cases, 74 VTEC O157 and 58 non-O157 strains were isolated (one person had dual infections). The number of bulls per human population and the proportion of the population with a higher education were associated with an increased occurrence and incidence of human VTEC infections in 70 (17%) of 416 of Finnish municipalities. In addition, the proportion of fresh water per area, the proportion of cultivated land per area and the proportion of low income households with children were associated with increased incidence of VTEC infections.

Conclusions

With hurdle models we were able to distinguish between risk factors for the occurrence of the disease and the incidence of the disease for data characterised by an excess of zeros. The density of bulls and the proportion of the population with higher education were significant both for occurrence and incidence, while the proportion of fresh water, cultivated land, and the proportion of low income households with children were significant for the incidence of the disease.

Background

Verotoxigenic

Several risk factors for VTEC infections have been identified from outbreak data, modelling studies, and descriptive and analytical epidemiological studies. Cattle are considered the main reservoir for VTEC

Poisson distribution based regression models are standard models for count data, including for infectious disease surveillance data

The aim of this study was to identify explanatory variables for VTEC infections reported to the NIDR in Finland between 1997 and 2006. VTEC infection is a rare disease in Finland and we used Poisson distribution based hurdle models with variable selection based on in-depth questionnaires and known risk factors. We chose the hurdle model based on its ability to handle an over-dispersion of zeros and applicability to scarce data. The incidence rate of VTEC infection at municipality level was used as an outcome variable and explanatory variables included various agricultural, socioeconomic and environmental factors. These explanatory variables also included several factors that have not been studied with VTEC infections previously. The null hypothesis that the populations in different municipalities in Finland are at the same risk of contracting VTEC infection was tested using a Poisson distribution based hurdle model.

Methods

Microbiological characterisation

The presence, isolation and biochemical identification of VTEC bacteria from stool cultures was done as previously described

Definitions for the microbiologically confirmed VTEC cases

Domestic case

A case of VTEC infection with no history of foreign travel during the two weeks prior to onset of symptoms.

Index case

The VTEC case with "earliest onset of symptoms"- date or isolation date within a household or outbreak.

Household case

A VTEC case living in the same household with the index case.

Outbreak case

Two or more epidemiologically and microbiologically linked VTEC cases caused by strains of identical subtype and not of the same household.

Sporadic case

A VTEC case with no epidemiological or microbiological link to another VTEC case.

Primary case

Index or sporadic case.

Surveillance of VTEC in Finland

For all cases reported to the NIDR since 1997, date of birth, sex, place of residence, the earliest reported date, sampling date, travel information, and death notification were reported. An in-depth interview as a part of the surveillance has been conducted since 1998

The information that was available from both forms or could be obtained by combining information included: Place of residence (postal code), attending school or day care and having VTEC positives in the same household. Clinical information included presence of diarrhoea, visible blood in the stool, HUS or TTP as a complication, admittance to a hospital, and death. Certain risk factors were asked in one or both forms, including living or visiting farms with domestic animals, consumption of raw or undercooked meat, unpasteurized milk or untreated water, and swimming in natural water.

Modelling

As the infectious disease surveillance data is count data and there was an excess number of zeros in the data (mean number of cases 0.31 per municipality with variance of 1.2), we developed a Bayesian Poisson complementary-log log (clog log) hurdle model

Domestic primary cases of VTEC were included in the analysis as outcome variables for the 10-year follow-up period. These were assigned with home municipality according to information on the in-depth questionnaires or NIDR information using a municipality division of 416 municipalities in Finland in 2007. The explanatory variables for modelling included various agricultural, socioeconomic and environmental factors. The results from the descriptive epidemiology of the present study and literature were used to identify potential variables to be included into the model. We performed a univariable analysis in the hurdle model with all potential explanatory variables and in the multivariable analysis we tested those variables that had p-values < 0.20 in the univariable analysis. Of the correlated variables, only the most significant ones and one per correlation group were included in the final model within each group, as marked in Additional file

**Variables included in the modelling study and their sources**. List of agricultural, socioeconomic and environmental variables used in the present study.

Click here for file

**The statistical hurdle model for the VTEC infections**. The statistical hurdle model in detail used in the present study.

Click here for file

**Winbugs code for the hurdle model**. Winbugs code for the model used in the present study. This file can be viewed with Winbugs (

Click here for file

The statistical packages used included R (version 2.11.1), SPSS (version 18), Arcgis (version 9.3) (for data pre-processing), WinBUGS (version 1.4.3) and Stata (version 9.2). The hurdle models are described in the Additional file

Results

Description of the cases

Between 1997 and 2006, 247 cases of microbiologically confirmed VTEC infections were reported to the NIDR in Finland. There was an average annual incidence rate of 4.8 (range 1.9-12.0) per million inhabitants and 25 (10-62) cases on average were identified annually. Of all cases, 46 cases (19%) had a history of foreign travel; these cases were excluded from further analyses. Outbreak cases accounted for 29 (12%) and household cases for 77 (31%) of all the cases, and of these, 36 domestic index cases were enrolled. Altogether 131 domestically acquired primary cases were enrolled. Of the 131 cases, 95 (73%) cases were sporadic.

Of all 247 cases, 249 strains were isolated from primary stool cultures of patients or their close contacts with VTEC infection. Two patients had a dual infection caused by two VTEC strains. Of the 131 domestically acquired primary cases enrolled in this study, 132 VTEC strains were isolated: 74 (56%) were serogroup O157 and 58 (44%) were of the non-O157 serogroup; one case had dual infections of both serogroups. Of the O157 strains, 60 (81%) typed to sorbitol negative and 14 (19%) to the sorbitol positive O157 group. Within the O157 group, the most predominant phage types (PT) were PT2, PT88 or its variant, PT8 and PT4. The genomic DNA of the strains gave 33 and 50 different PFGE patterns for the VTEC O157 and non-O157 strains, respectively. The non-O157 strains typed into 20 serogroups, the most common O groups being O103, O145 and O26.

Of the 131 primary cases, 51 (39%) were aged younger than 5 years and 23 (18%) were older than 40 years. Half (66, 50%) were male, with the gender distribution also being equal in the age groups (data not shown). Interviews were conducted for 83 (63%) cases. The postcode for the home address was known for 25 (30%) of the cases. Secondary household VTEC positives were reported in 24 cases (29%).

The symptoms were typical of VTEC infection: 74 (94%) had diarrhoea and 69 (90%) visible blood in the stools. HUS developed in 21 (28%) cases, and of these, 5 (7%) had TTP; in total, 61 (74%) were admitted to a hospital, and none of the cases died. Information on the reported risk factors was scarce: of the 37 cases aged 0-6 years, 8 (22%) attended day care. Living or visiting a farm with domestic animals was reported in 17 (22%) cases. Unpasteurized milk products were consumed in 16 (20%) cases, raw or undercooked meat was consumed in 17 (24%) cases. Consumption of untreated water was reported in 6 (17%) and swimming in natural water in 3 (8%) cases.

Modelling of the data

The modelling unit was the municipality and between 1 and 11 cases were identified in 70/416 (17%) municipalities (Figure

Incidence rate of sporadic cases of VTEC infections per million inhabitants in Finland by municipality 1997-2006

**Incidence rate of sporadic cases of VTEC infections per million inhabitants in Finland by municipality 1997-2006**.

In the multivariable model, bulls per human population and level of higher education per population were significant in the zero part of the hurdle models, indicating the importance of these variables for the occurrence of the infection (Table _{bulls}/1.386)-1 = 4% and exp(0.01*β_{educ}/1.386)-1 = 4%, respectively. Likewise, the 1% change in bulls, farms, fresh water, educational level and in low income households in the Poisson part increases the standardised incidence ratio (SIR) by exp(0.01*β_{bulls}) = 5%, exp(0.01*β_{farms}) = 2%, exp(0.01*β_{fresh_water}) = 2%, exp(0.01*β_{educ}) = 6%, exp(0.01*β_{low_ income) }= 7%, respectively

Final multivariable hurdle models, with and without the spatial correlation variable

**Variable (*)**

**Hurdle model with spatial correlation variable**

**truncated Poisson**

**zero model**

**Hurdle model**

**truncated Poisson**

**zero model**

Bulls per population

5.0 (2.2-9.1)

5.0 (1.3-9.0)

5.1 (2.2-9.3)

4.8 (1.1-8.7)

Proportion of adult population with middle education or higher

5.6 (2.0-9.1)

5.0 (2.3-8.1)

5.8 (1.9-10.7)

4.6 (1.5-7.7)

Proportion of low income households with children

7.1 (1.9-12.4)

--(ns, excluded)

7.2 (1.9-12.3)

--(ns, excluded)

Number of farms under cultivation per square km

2.0 (0.7-3.2)

--(ns, excluded)

2.1 (0.5-3.3)

--(ns, excluded)

Fresh water per total surface area

1.9 (0.4-3.3)

--(ns, excluded)

1.9 (0.4-3.4)

--(ns, excluded)

(*) Parameter estimates (with 95% equal tail credible intervals)

Interestingly, using a cross validation procedure in the univariable or multivariable hurdle models, several more influential values were found. When two influential observations (in both zero and Poisson parts of the model) were removed from the data, the fresh water variable became negatively significant. When examining these municipalities in more detail, it became plausible that there may have been prolonged, undetected outbreaks. One of these municipalities was a medium size city with nine cases, with six cases during one calendar year and of those, four had identical PFGE profiles. The other was a small municipality with a population of just over 10 000 that had four cases during a two-year period with different PFGE patterns. Furthermore, these municipalities were among those with the highest proportion of fresh water, suggestive of water borne outbreaks. Also the number of cattle was high in these municipalities.

Discussion

VTEC infection has a low average annual incidence rate of 4.8 per million inhabitants in Finland. The cases described in the present study were predominantly children with typical symptoms of VTEC infection: Most cases had haemorrhagic diarrhoea and some cases of HUS were identified. Half of the cases were infected with O157 serogroup strains. Detailed microbiological analysis including genotyping was essential to distinguish between outbreak and non-outbreak strains. Approximately a third of cases had a household contact who was VTEC infection positive, indicating direct transmission of the infection. Many of the cases reported visits to farms or consuming undercooked meat or unpasteurized milk; however, it was difficult to assess the impact of the risk factors obtained from the in-depth questionnaires as the information was scarce. Bias due to under-reporting of the cases is considered to be low for VTEC; however, diagnostic differences may occur between health care districts.

Hurdle modelling has not yet been widely used on infectious disease surveillance data

We identified the proportion of beef cattle to human population and the proportion of population with higher education as being associated with increased occurrence and incidence of VTEC infections. The proportion of the population with higher education is likely to represent an indirect indicator of consumption habits, possibly eating undercooked beef steak. In addition, the number of farms under cultivation per area, fresh water per area and the proportion of low income households with children were associated with increased incidence of the infection. These are likely to represent routes for the spread of the infection; the low income households may be an indirect indicator of consumption habits. The model without the spatial variable essentially gave the same results. Our results concur with a US study on ecological factors for gastrointestinal infections, where the proportion of the population living on a farm (positive association), low education (negative association) and living in the south (negative association) were the most important socioeconomic factors for acquiring a VTEC infection

The finding of beef cattle as a strong explanatory variable was not unexpected as cattle have been found to be an important risk factor for VTEC infections in epidemiological and modelling studies

The result for the proportion of fresh water surface per area is intriguing. Previous studies suggest drinking and swimming water as a source of VTEC infections and outbreaks

The finding that the proportion of the population with a higher education and the proportion of low income households with children being associated with increased incidence of the infection is somewhat contradictory. However, we feel that these two variables may explain different phenomena. There is conflicting evidence about educational level as a risk factor for VTEC infections

The study was limited by the low number of VTEC cases; therefore spatiotemporal analysis was not used. This was compensated for by using specific models designed for scarce data and performing a purely spatial analysis. The seasonal trends could not be evaluated properly due to the low number of cases. Another limitation of the study was the unavailability of data regarding consumption of beef or other food items per municipality. It has been found in Scotland that of all VTEC infections, 16% of cases had a significant food-related exposure, such as eating high-risk foods or having a contact working with meat

In conclusion, cattle are the main reservoir for VTEC infections as well as being also an important factor for the incidence of the disease. In addition, higher education per population was found to be associated both with the occurrence and the incidence of the disease, implying that food exposures may play a role. The role of fresh water and other environmentally related variables warrants further studies. Socioeconomic factors like low income and educational level expedite the incidence of VTEC infection. The recommendations for control measures for the prevention of VTEC infections given by the WHO include good hygiene in animal and slaughter process and food retail, consumer education on proper handling of foods, and up-to-date international and national legislation and regulation

Conclusions

We enrolled 131 cases of VTEC with 74 VTEC O157 and 58 non-O157 strains isolated (one person had dual infections). With Poisson hurdle models we were able to distinguish between risk factors for the occurrence of the disease (zero part of the model) and incidence of the disease (zero truncated Poisson part of the model) for data characterised by an excess of zeros in the case counts. The number of bulls per human population and the proportion of population with a higher education were identified as being associated with increased occurrence and incidence of human VTEC O157 and non-O157 infections. The proportion of fresh water per area, the proportion of cultivated land per area and the proportion of low income households with children were associated with increased incidence of VTEC infections. We recommend using a Bayesian framework with imputation for the missing values of the explanatory variables in conjunction with the missingness equations.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

KJ performed the analysis of the descriptive epidemiology data and the statistical analysis and drafted the manuscript. JO designed the statistical analysis and helped to perform it. ME performed the microbiological analysis and participated in the analysis of the descriptive epidemiology. AS participated in the design and coordination of the microbiological analysis. MK participated in the design and coordination of the epidemiological analysis. All authors read and approved the final manuscript.

Acknowledgements

We thank Raili Ronkainen for assistance with the data handling and Tarja Heiskanen, Aino Kyyhkynen and Anna Liimatainen for their laboratory assistance.

Pre-publication history

The pre-publication history for this paper can be accessed here: