Department of Computer Science, University of Saskatchewan, Saskatoon, Canada

Department of Community Health and Epidemiology, University of Saskatchewan, Saskatoon, Canada

Abstract

Background

The contact networks between individuals can have a profound impact on the evolution of an infectious outbreak within a network. The impact of the interaction between contact network and disease dynamics on infection spread has been investigated using both synthetic and empirically gathered micro-contact data, establishing the utility of micro-contact data for epidemiological insight. However, the infection models tied to empirical contact data were highly stylized and were not calibrated or compared against temporally coincident infection rates, or omitted critical non-network based risk factors such as age or vaccination status.

Methods

In this paper we present an agent-based simulation model firmly grounded in disease dynamics, incorporating a detailed characterization of the natural history of infection, and 13 weeks worth of micro-contact and participant health and risk factor information gathered during the 2009 H1N1 flu pandemic.

Results

We demonstrate that the micro-contact data-based model yields results consistent with the case counts observed in the study population, derive novel metrics based on the logarithm of the time degree for evaluating individual risk based on contact dynamic properties, and present preliminary findings pertaining to the impact of internal network structures on the spread of disease at an individual level.

Conclusions

Through the analysis of detailed output of Monte Carlo ensembles of agent based simulations we were able to recreate many possible scenarios of infection transmission using an empirically grounded dynamic contact network, providing a validated and grounded simulation framework and methodology. We confirmed recent findings on the importance of contact dynamics, and extended the analysis to new measures of the relative risk of different contact dynamics. Because exponentially more time spent with others correlates to a linear increase in infection probability, we conclude that network dynamics have an important, but not dominant impact on infection transmission for H1N1 transmission in our study population.

Background

The threat of emerging infectious diseases has stimulated the search for techniques to prevent and control communicable disease spread

Data collected by contact tracing

Some early work in the linking of health and micro-contact data has been reported

As the first influenza pandemic in decades, the H1N1 pandemic – whose initial outbreak was described in April 2009 – served as a catalyst for research into control of emerging infectious diseases. Within the study site of Saskatoon (a Midwestern Canadian city of approximately 250,000 people) H1N1 first emerged in Spring 2009, and followed the typical summer quiescence, and Autumnal re-emergence. By mid-October, cases of H1N1 began a notable rise

In anticipation of the significance of the 2009–2010 influenza season, the co-authors had launched a previously-described

In this work, we sought to integrate rich contact micro-data collected in

Unlike

1. A novel methodology for integrating disease, population level, and micro-contact data into a coherent agent-based simulation framework, validated by comparison with the actual health status of the study population;

2. A comparison of metrics for measuring the risk associated with contact and contact-duration, culminating in a novel measure: log time-degree (LTD);

3. A demonstration of the utility of micro-contact data during an epidemic outbreak based on both empirical and simulation results;

4. A preliminary investigation into the role of dynamic network structure on the spread of disease, and the impact of vaccination on that structure.

Methods

Our primary data source was Flunet

Simulation structure and flow.

**Simulation structure and flow.**

The simulation model is encapsulated in the dashed box. Agents remained in a susceptible state unless acted on by an infection event. Such stochastic events were triggered externally using the exogenous pressure data derived from case reports

We performed Monte Carlo ensembles of stochastic dynamic simulations operating on the contact data, where the primary variables drawn from distributions were disease stage durations, exogenous infection event rates and the probability of transmission from infected endogenous contacts. Ensembles were selected in a memoryless fashion based only on the disease parameters. In every realization (simulation run), the contact record was stepped through like an animation, creating exactly the same sequence of contacts in the course of every Monte Carlo realization, which we have termed a Groundhog Day technique, after the 1993 movie of the same name. This is similar to the technique we employed in

Contact data

The Flunet study population consisted of 36 participants, each carrying a small wireless sensor (or “mote”) capable of short-range wireless communication

When two motes were in close proximity, they would each record a contact with a minimum resolution of 30 seconds. Each contact record represented a contact session between two motes, which included the start and end time of a contact, and the distance between the adjacent motes. A contact’s distance was estimated by binning the received signal strength indicator (RSSI, a measure of the wireless signal strength) into close (< 5 m), medium (5–15 m) and far (>15 m) bins

A preliminary analysis of the dataset is provided in

Flunet Findings.

**Flunet Findings.****a)** Contact histogram by hour of day, **b)** CCDF of contact duration, **c)** Connectivity graph with threshold of 18 minutes per day average contact. Black nodes represent stationary nodes associated with a location, and are included in this graph for illustrative purposes only. **d)** Network span for close and all contacts.

Figure

To visually highlight the impact of cliques and place on the dataset, Figure

Given the importance of network structure, we consider the span of the network in Figure

Transmission model

Model design

The simulation model classified each individual in the sample population into one of seven states:

Dynamic transmission models differ in their treatment of contacts. For some epidemiological contexts, the contacts underlying transmission are of defined or bounded duration – for example, needle sharing, sexual encounters, and blood transfusions. For this class of contacts, the frequency rather than the duration of contacts is the primary source of variability in transmission risk. For air-borne infections, however, the likelihood of transmission rises not only with contact frequency, but also with contact duration

For the case of H1N1 influenza transmission, our model assumes that on-going contact between two discordant individuals provides a conduit for transmission, where the likelihood density of transmission is a constant independent of contact duration. More specifically, we posit that an infectious individual gives rise to potentially contagious events (e.g., sneeze or cough) at a fixed rate

Given this model, the basic reproductive number _{
0
} (the average number of secondary infections caused by an infective individual, in an otherwise susceptible population) is as follows:

where

For endogenous infections, we assumed that the mean of the basic reproductive number R_{0} for our study population was equal to 1.31, identical to that reported in a prominent Canadian H1N1 study

Where _{
i
} is set to 3.38

where _{
p
} represents the number of study participants, _{
d
} gives the study duration, and _{
c
}(_{
d
} days.

The per-week infection hazard of acquiring the infection from exogenous sources was determined according to the following formula:

where _{
i
} gives the number of laboratory confirmed H1N1 cases in the province of Saskatchewan during the ^{
th
} week of the 2009–2010 influenza season, and _{
U
} refers to the total population of the province. The denominator represents an estimate of the number of susceptible individuals in the province. The entire denominator thus estimates the number of susceptible individuals who remain at risk in week

Weekly laboratory-confirmed H1N1 cases reported in Saskatchewan.

**Weekly laboratory-confirmed H1N1 cases reported in Saskatchewan.**

A susceptible agent receiving the infection from either exogenous or endogenous sources transitions to the latent state. Before starting the Latent period, the model computed the duration for each of the subsequent four stages of illness (Figure _{
Inc
}) and duration of symptoms (_{
S
}) from two log-normal distributions with parameters from _{
Ill
} was calculated by adding _{
Inc
} and _{
S
}. Using these three values, the total duration of infectiousness _{
Inf
} was calculated as:

where _{
Ill
} gives the computed duration of illness, _{
sInf
}was estimated using:

The remainder of the durations were computed using following subtractions:

where _{
aInf
} represents the asymptomatically infectious duration, _{
lat
} shows the latent period duration, and _{
nInf
} shows the symptomatic non-infectious duration.

Each infected agent experienced the four illness states sequentially with the passage of time. A person in the

For the sake of the simulation, we assumed no H1N1 mortality. Our study lacked sufficient data to predict whether a specific individual would elect to self-quarantine given a symptomatic infection, and did not consider hospitalization outcomes. Given these assumptions, we chose to regard an individual’s contact patterns as unaffected by the health status of that individual and those around them. To examine the degree to which these assumptions might shape simulation results, we simulated an additional Monte Carlo ensemble examining the extreme situation in which individuals removed themselves from circulation for the duration of their symptomatic period. Finally, in light of the dominance of the H1N1 strain during the Saskatchewan 2009–2010 influenza season, only one strain of influenza was considered.

Simulation setup

The model described in the previous sections was implemented in Network Simulator 3 (ns-3), a discrete-event simulator. A network of 36 agents was created, where each agent represented one individual. The Flunet study data was discretized into 30-second time slots, and at each time slot the connectivity between each pair of individuals was updated based on the contacts recorded in the Flunet dataset. This dynamic contact network can be visualized as a time-varying graph where edges appear or disappear every time step depending on whether the two participants were in contact. The network could also be effectively encoded as a fully connected graph where edge weights at every time step have a value of 0 (unconnected) or 1 (connected). The fully connected graph representation can easily be implemented as a time series of sparse symmetric matrices (one for every time step) where 0 represents no connection between the node (

To estimate _{
x
}, the model required a time series of the laboratory-confirmed H1N1 cases in Saskatchewan. This data was extracted from the Public Health Agency of Canada FluWatch ^{th} week (January 4^{th} 2010), and therefore _{
x
} in the model for 10^{th} week to the end of the simulation was zero.

Observed attack rate based on endogenous and exogenous infection pressure.

**Observed attack rate based on endogenous and exogenous infection pressure.** Attack rate (fraction of endogenous population infected) according to different assumptions about endogenous and exogenous infection pressure. In the left-hand panel, vaccination effect is incorporated, while in the right hand panel no vaccination is considered.

Each susceptible agent drew from a distribution at each time-step to determine whether it was infected by exogenous sources. If it contracted the infection – whether from exogenous or endogenous sources – the agent determined the integral duration (in units of time-steps) for each state of the infection based on the equations explained in Model Design section, and proceeded to remain each state for the determined number of time-steps. During the infectious period, the agent drew from a distribution to to determine whether it infected other nearby susceptible agents.

Scenarios

The simulation explored a three-dimensional scenario space that examined the impact on model outputs of four distinct assumptions. The first two assumptions related to the exogenous and endogenous forces of infection (FOI). An exogenous FOI coefficient linearly scaled _{
x
} to values that were 1, 2, 4, 8, 16, and 32 times the baseline. Similarly, the endogenous FOI coefficient scaled

The third assumption varied was whether the H1N1 vaccination status from

One supplementary baseline scenario explored the impact of participants removing themselves from circulation during their symptomatic period. Note that to compute

In total, the scenario space consisted of three baseline scenarios and 72 additional scenarios. Each baseline scenario was simulated using 100,000 Monte Carlo realizations; the other 72 alternative scenarios were each simulated using 2,500 Monte Carlo realizations. Exploration of the scenario space (including the baselines and alternative scenarios) required running 480,000 different realizations.

Metrics for contact networks structure

While static representations of social networks are convenient, popular, and can yield powerful insights

Betweenness centrality is a classic measure of network structure that attempts to capture the importance of the node to the graph’s connectivity, by summing the number of times a node lies on the shortest path between two other nodes, calculated using:

where _{
ab
} is the number of shortest paths between _{
ab
}(

While betweenness captures a global picture of the network by examining shortest paths, degree centrality only considers a node’s number of one-hop neighbors. For a static graph, degree centrality is calculated according to:

where deg (

Time degree centrality (TD) for a node can be defined as the average over all time slots of the fraction of all other agents with whom that node is in contact in a given time slot (analogous to the “strength” metric proposed in

where _{
k
} is the total number of time slots in the period and _{
D
}
^{
th
} vertex at time

If the heterogeneity of the system is dependent on the network structure, then the likelihood of a participant’s infection at some point during the study should be correlated with appropriate network structure metrics. We ran Pearson and Spearman correlations using the MATLAB statistical toolbox against the probability of infection in two baseline simulations (with and without vaccination) against the four measures of centrality described above. Given that an individual’s network location may also shape their likelihood of transmitting a pathogen when infected, for the same scenarios as above we ran correlations of the four centrality measures against the average number of secondary endogenous infections directly caused by a node once it was infected. Finally, to better understand the effect of vaccination status on the correlations derived above, we also used Student’s

Results

We analysed the response of our simulated infections to changing endogenous and exogenous infection pressure and the proximity threshold required for transmission to confirm that the simulation did not produce any significant artefacts. This served both as a cross-check on the H1N1 influenza model proposed in

Transmission model

Figure

Self-reporting of participants’ health conditions in the Flunet dataset

Figure

Number of infections per week.

**Number of infections per week.** The number of exogenous and endogenous infections per week without vaccination (left), and with vaccination (right) over the course of 100,000 runs.

Impact of overall network structure

Having established that the disease model and parameters are broadly consistent with the empirical observations regarding the H1N1 outbreak in the study population in Fall 2009, we used two scenarios with 100,000 realizations each (with parameters described in Methods Section and covering scenarios with and without vaccination) to evaluate the impact of network structure and dynamics on the spread of disease. Unlike most previous work in agent-based modeling, this study had recourse to detailed contact records containing not only high-fidelity temporal data, but also proximity estimates. By constraining our inquiry to a single contact criteria and ILI, we leveraged the strength of our dataset to investigate the impact of contact network structure and contact duration on the spread of a specific disease. Because of our large-scale Monte Carlo ensembles, we believe that the variations in the underlying H1N1 model data have been well explored; therefore, we expect that heterogeneity in the results of the simulations to be dominated by the impact of network structure and contact duration rather than simulation artifacts.

Depth of infection

Because the duration of contacts is characterized by an approximate power law relationship for much of its span (Figure

Depth of infection spread.

**Depth of infection spread.** Depth of infection spread for scenarios with consideration of vaccination (x’s) and without consideration (o’s).

In both cases, the plot seems to generally follow a classic small-world network heavy-tailed power law distribution

Infection impacts

Before analysing impacts, it is necessary to establish appropriate metrics for measuring network connectivity. Table

**Pearson**

**Spearman**

**Pearson**

**Spearman**

**ρ**

**p**

**ρ**

**p**

**ρ**

**p**

**ρ**

**p**

Betweenness

0.172

0.316

0.245

0.149

0.110

0.522

0.239

0.160

Degree

0.415

0.012

0.292

0.084

0.296

0.080

0.258

0.128

TD

0.514

0.001

0.744

<0.001

0.344

0.040

0.519

0.001

LTD

0.740

<0.001

0.744

<0.001

0.503

0.002

0.519

0.001

Table

While the correlations between time degree and probability of infection remain significant in the simulations that included vaccination information, the degree of correlation diminishes. This result is not surprising, as vaccination has a direct impact on the likelihood of infection, which goes to zero in the model regardless of the individual’s network connectivity. This observation is interesting for two reasons: first, it demonstrates that independent variation in infection likelihood diminishes the impact of network structure; and second, that even in the face of a highly non-linear, but not universal, disturbance (not all nodes are vaccinated), the underlying impact of network structure remains significant.

Student’s

**Betweenness**

**Degree**

**TD**

**LTD**

p-Value

0.93

0.723

0.721

0.911

Transmission impacts

While network structure impacts the population spread of a pathogen due to its strong effect on infection risk, it also changes risk of transmission given infection. Table

**Pearson**

**Spearman**

**Pearson**

**Spearman**

**ρ**

**p**

**ρ**

**p**

**ρ**

**p**

**ρ**

**p**

Betweenness

0.184

0.283

0.244

0.151

0.139

0.420

0.253

0.137

Degree

0.399

0.016

0.300

0.076

0.302

0.073

0.296

0.080

Time Degree

0.665

<0.001

0.895

<0.001

0.472

0.004

0.615

<0.001

Log Time Degree

0.802

<0.001

0.895

<0.001

0.590

<0.001

0.615

<0.001

Finally, significant correlations between likelihood of infection and both TD and LTD centralities were maintained for those Monte Carlo ensembles in which behavior change was assumed to limit infection transmission to the asymptomatic period. The persistence of the correlations in this extreme case suggests that even in the presence of strong behavioral change on the part of symptomatic infectives themselves, duration-based measures are likely to remain important indicators of infection risk.

Impact of internal network structure

While our number of participants is insufficient to make strong generalizations about the impact of internal network structure on the transmission of infection, the simulations based on the data can illustrate effects that may be observed in larger networks. The first and most apparent is the correlation between log time degree and the risk of both infection and transmission. The impact of network structure and vaccination on endogenous cumulative infection probability can be illustrated by plotting LTD against the cumulative likelihood of infection for both scenarios, as shown in Figure

Impact of LTD on endogenous infection probability.

**Impact of LTD on endogenous infection probability.** Impact of a node’s log-transformed TD centrality (LTD) and immunization on endogenous infection probability. Results from two simulation scenarios are shown: One where the vaccination effect is considered (x’s), and another where this effect is ignored (o’s). The red line indicates the least squares fit for the case without vaccination. Dashed lines represent outliers; solid lines denote a single identified subnet.

Although LTD centrality only offers an approximation of the likelihood of infection, the log-linear regression suggests a strong dependence on network structure in the absence of other effects. People with larger LTD centrality are linearly more likely to get infected, indicating a degree of predictive power. To fully verify this hypothesis requires a more rigorous statistical treatment and a larger participant population. This relation has limited predictive power because the probability of infection and therefore the slope of the line depend not only on the LTD centrality, but the parameters of the disease, and individual risk factors, which may be difficult to derive or collect in practice. The most we can conclude is that people with larger TD centrality will have a measurably increased risk of infection, with all other factors held equal.

The graph also indicates that there exists an LTD centrality below which it is impossible to become endogenously infected (near −11). Our own simulated data refutes this, as even the least connected individuals had non-zero endogenous infection counts. However, the

There are two primary sets of outliers in Figure

To better visualize the impact of network structure and vaccination, we created a network graph that shows the participants’ log degree centrality as the size of the node, and the number of infectious events between those nodes as the width and color of the line between them. Figure

Infection transmission across network.

**Infection transmission across network.** Representations of infection rates across the network without **(a)** and with **(b)** vaccination. Node size represents participant LTD, node edge color and width represent infection events.

Several qualitative observations can be made from Figure

The impact of local effects on the office staff subnet is particularly pronounced, likely due to their strong intra-subnet and weak inter-subnet connections. If this was the case, then endogenous infections should be preferentially transmitted between members of the main office subnet, and not with members outside their subnets. Table

Member 1

Member 2

Member 3

Member 4

Fraction of transmission from/to outside members

6.88%

6.54%

17.89%

0.56%

It is clear from the table that the vast majority of the infectious events occur by passing infection inside the office subnet. When one subnet member is infected exogenously, other members stand a much higher chance of infection, given their very large subnet LTD. The main office has a relatively small chance of contracting an infection from the rest of the network, and a much higher chance of contracting an infection within their own subnet. This forces the infection rate for LTD down as shown in Figure

Discussion

The work in this paper builds on approaches explored in important past contributions. These include individual

Limitations

While the results presented here constitute a clear methodological contribution to study of outbreaks, our techniques and data have implicit limitations. First and foremost, our findings are only demonstrated for the sub-population under observation, and the particular disease dynamics of H1N1. Other researchers

· **Individual Behavior:** We assumed that the contact dynamics of an individual within the dataset was independent of their own and their contacts’ infection status. While we examined an alternative scenario that examined the impact of assuming extreme behavior change following the appearance of symptoms, this scenario does not capture important direct and indirect effects, including the spread of risk perception, social distancing, and proactive protective measures among the study population.

· **Data Set Size:** Consistent with other electronic contact monitoring studies

· **Exogenous Contacts:** Lacking significant data on external contacts, we were forced to assume a uniform exogenous infection pressure for all individuals based on population-level statistics, which may mask potentially important diversity in vulnerability to external infection.

· **Equivalence of Place:** In adapting model parameter estimates from the published H1N1 model, we made the assumption that the populations experienced similar basic reproductive numbers – despite differences in population density and demographics.

Despite such limitations, we believe that the findings presented here emphasize the utility of combining simulation modeling and ambulatory data collection, and highlight the considerable value to be gained in model building, decision making and operational prioritization by adding duration-based measures into surveillance. The findings also underscore the importance of future studies using broader and more diverse study populations, improved understanding of exogenous contact patterns and behavior change, and refined simulation models.

Contributions

Many previous contributions have highlighted heterogeneity in population contact rates. Particularly pronounced heterogeneity has been observed in sexual contacts, for which contact rates appear to obey power law distributions

The importance of contact duration in modeling infection spread reflects two key roles that it plays in infection spread. First, it is a predictor for infection risk. Our results suggest significantly stronger associations between infection risk and duration-related network measures than are found using traditional centrality measures, confirming the results presented by

The desirability of contact-duration measures for modeling, decision-making, and operational prioritization has broader health surveillance implications. Although it is possible to collect time-weighted duration information using traditional self-reporting, results of prominent studies employing self-report have questioned the feasibility of imposing the additional requisite bookkeeping burden, further suggesting that automated mechanisms may be required

While other contributions

Future research directions

While the contributions of this paper are largely methodological – pertaining to the use and utility of micro-contact data in monitoring and modeling outbreaks – we have made several observations based on our data that merit further examination in the future using larger datasets. First, we noted a regressive fit between LTD and infection risk. With a larger study population, one could utilize standard epidemiological statistical techniques such as logistic regression to disambiguate the relative risk of LTD when compared to other factors such as age, gender, occupation, or socio-economic status. We also hypothesized that the actual relationship between LTD and infection risk would be characterized by a sigmoidal or similar function, with asymptotic minima, corresponding to a baseline chance of contracting a pathogen from the environment and maxima, corresponding to the point at which additional links confer little additional risk, as infectious exposure is almost guaranteed. Substantially larger datasets would be required to probe the extremes of the distribution. Finally, we posited that the internal network structure can impact the probability of infection, by comparing risk between two connected and one isolated subnet of high LTD participants. The isolated subnet received relatively few endogenous infections from the rest of the network, but suffered high mutual infection rates. As the workplace conditions of the main office staff is more in keeping with traditional Western work habits (predominantly defined schedules, and location) than graduate students (predominantly undefined schedules, roving locations), it may be that these isolated subnets could drive pathogen transmission elsewhere in society more than currently appreciated. Larger, and ethnographically broader, datasets will be required to properly investigate this hypothesis. While we have made strong methodological contributions to the study of pathogen spread, the detailed questions and hypotheses generated during our analysis may have a more significant long-term impact.

Conclusions

In this work we have presented the results of combining a micro-contact dataset and a population health data and simulation modeling methodology – termed a Groundhog Day system – for the study of the impact of contact dynamics on the spread of H1N1 influenza through a small study population during the 2009 flu season. Our results validated the transmission model, in providing close agreement with observed infection rates within the study population – as gathered with surveys. We leveraged the temporal span of our data to derive a risk metric – log time degree – which appears to correlate with both risk of being infected and risk of infecting given infection, all other factors held equal. The methodology described here is an important step in leveraging both personal and scientific computing for the study of infectious disease.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MH collected the contact data, implemented and ran the simulations, performed analysis and contributed text to the paper, NO provided the initial experimental and simulation design and contributed to the paper, KS provided the initial data collection design, performed analysis and contributed to writing of the paper. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to acknowledge the Natural Sciences and Engineering Research Council of Canada for providing funding for this research, and the University of Saskatchewan HPC Training Facilities for providing computational resources.

Pre-publication history

The pre-publication history for this paper can be accessed here: