Section of Integrative Biology, University of Texas at Austin, 1 University Station C0930, Austin, TX, 78712, USA

Lewis-Sigler Institute for Integrative Genomics and Department of Chemistry, Princeton University, Princeton, NJ, 08544, USA

Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA

Department of Epidemiology and Public Health, Yale University Medical School, New Haven, CT, 06520, USA

Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA

Abstract

Background

Contact network models have become increasingly common in epidemiology, but we lack a flexible programming framework for the generation and analysis of epidemiological contact networks and for the simulation of disease transmission through such networks.

Results

Here we present EpiFire, an applications programming interface and graphical user interface implemented in C++, which includes a fast and efficient library for generating, analyzing and manipulating networks. Network-based percolation and chain-binomial simulations of susceptible-infected-recovered disease transmission, as well as traditional non-network mass-action simulations, can be performed using EpiFire.

Conclusions

EpiFire provides an open-source programming interface for the rapid development of network models with a focus in contact network epidemiology. EpiFire also provides a point-and-click interface for generating networks, conducting epidemic simulations, and creating figures. This interface is particularly useful as a pedagogical tool.

Background

Epidemiological models traditionally assume mass-action dynamics: individual hosts in a population have identical contact rates and are well-mixed, such that any given pair may interact and transmit disease with equal probability

Contact network epidemiology

The 2003 SARS outbreak in China illustrates one limitation of mass-action models. Early estimates of the basic reproduction number (_{0}) for SARS predicted that 120 days after the disease was introduced in China, between 30,000 and 10 million individuals would have been infected in China

The spread of sexually transmitted infections is also often modeled more effectively with contact networks than mass-action models, because of non-randomness and heterogeneity in sexual contact patterns. An infected individual with many contacts may be more likely to spark a significant outbreak than an infected individual with fewer contacts. A study of 2,810 randomly sampled Swedes aged 18-74 years found that the number of sexual interactions per person approximately followed a power-law distribution, indicating significant heterogeneity in sexual activity

Contact network models also allow straightforward analyses of epidemiological dynamics and intervention events occurring at particular nodes. For example, the approach has been used to optimize the distribution of limited resources such as vaccines and antiviral drugs within a population

Although there are powerful mathematical methods for analytically estimating the dynamics of epidemics in complex contact networks

In the chain-binomial model, time is measured in arbitrary, discrete units. Simulations are parameterized with the number of time steps individuals remain infected, and the per-time-step transmissibility. The chain-binomial model may be used to generate epidemic curves (i.e., incidence time series data), identify chains of transmission, estimate the probability of an epidemic, and assess the distribution of epidemic sizes.

Percolation simulations give each infected individual one opportunity to spread disease to each of their susceptible contacts, but do so in an arbitrary, non-chronological order. One approach is to use a probabilistic breadth-first traversal of the network. In the simplest case, transmissibility is the same for all edges. One approach is to maintain four dynamic lists of nodes: (a) susceptible, (b) newly-infected, (c) currently-infected, and (d) recovered. At the beginning of a typical simulation, one or more nodes will be placed on the currently-infected list, and the remaining nodes will be on the susceptible list. Then the following two-step procedure is repeated until the currently-infected list is empty.

(1) For each currently-infected node, a uniform random number between zero and one is generated for each edge that connects the node to a susceptible neighbor. If and only if the random number for a given edge is less than the transmissibility, then the susceptible neighbor moves from the susceptible list to the newly-infected list.

(2) After all currently-infected nodes have been tested in this way for transmission, the nodes in the currently-infected list are moved to the recovered list and the nodes in the newly-infected list are moved to the currently-infected list.

A breadth-first approach like this will result in a coarse approximation of the epidemic curve, similar to a chain-binomial simulation with an infectious period of one unit. Percolation simulations tend not to predict realistic chains of transmission, since new infections occur in cohorts, and the order of transmission events is based on arbitrary orderings of each cohort. The percolation algorithm above can be modified slightly to simulate a chain-binomial model: rather than testing each edge from an infected to a susceptible node once, we test each edge

While the field of contact network epidemiology is growing rapidly, it still lacks a flexible, user-friendly programming toolkit for generating contact networks, analyzing their structure and simulating the spread of disease through them. There are a few freely-available libraries for simulating and analyzing networks, but they are suboptimal for epidemiological research, particularly for novice programmers. Specifically, NetworkX

Here, we introduce EpiFire, an applications programming interface (API) implemented in C++. EpiFire includes a fast and efficient library for generating networks with a specified degree distribution, measuring fundamental network characteristics, and performing percolation and chain-binomial simulations of SIR (susceptible-infected-recovered) disease transmission on generated networks. We have also developed a user-friendly interface that allows the user to perform these functions in a point-and-click environment and provides intuitive graphical results of epidemic simulations.

Although network models can be made to approximate mass-action models by assuming a completely connected network

Implementation

EpiFire comprises two bodies of code that are written in object-oriented C++: the applications programming interface (API) and the graphical user interface (GUI). The EpiFire GUI was developed using the API and Qt

The EpiFire API consists of 20 classes and 2,500 lines of non-whitespace code. The EpiFire GUI consists of 12 classes and 3,500 lines of non-whitespace code.

Installation

EpiFire source code is available from GitHub at

EpiFire API

Functionally, the EpiFire API consists of tools for network generation, network manipulation, network characterization, and epidemic simulation. Programmatically, the API is divided into network, node, edge, and simulation classes. Each class defines a type of variable and its associated attributes and functions. For example, the network class allows users to define a network variable, which can contain one or more node variables that can be connected by one or more edge variables. In the small program below, an undirected network called

The network constructor takes two arguments: an arbitrary text string naming the network, and either Network::Undirected or Network::Directed, which specifies whether all edges are undirected, or some or all may be directional.

Each time the program is run, a different randomly connected graph will be produced. The following is an example of the beginning of the output file:

In this case, Node 0 was connected to Nodes 97, 17, and 21. Node 1 was not connected to any others.

More sophisticated examples, including networks being used in epidemic simulations, can be found in Additional file

**Appendices A and B.**

Click here for file

The network modeling portions of the code (the Network, Node, and Edge classes) can be used with or without the epidemiological code, and may therefore be useful for non-epidemiology applications. The simulation classes provided include three types of finite, stochastic epidemic simulations: percolation and chain-binomial (both network-based), and mass-action. Users may use the provided simulation classes or may create derived classes based on them. For example, the base class for percolation simulations, called Percolation_Sim assumes a disease with susceptible-infectious-recovered states. A simple derived simulation class can be created that inherits almost all the functionality of Percolation_Sim, but that uses an alternate progression of states. An example of a derived simulation using the susceptible-exposed-infectious-recovered state progression (SEIR_Percolation_Sim.h) can be found in the research directory provided with the source code.

Networks may be constructed explicitly by reading in an edgelist file, or adding individual nodes and specifying their connections. Networks can also be constructed implicitly by using one of the network generators provided. Generators for ring and square lattice networks are provided, as well as three random network generators: the Erdős-Rényi model

Networks that are generated via the configuration model can contain edges that are usually undesirable in epidemiological models. Pairs of nodes may be randomly connected by two or more edges, and nodes may be “connected” to themselves by edges going to and from the same node. These edges, called parallel edges and self-loops respectively, may be removed using the provided “lose-loops” function (Additional file

Random numbers are generated using the Mersenne Twister algorithm

Percolation and chain-binomial pseudocode

EpiFire provides epidemic simulators using the percolation and chain-binomial models, represented as pseudocode below. Both pseudocode functions take a network as an argument and return final epidemic size. The most recent implementations, including additional functions for the simulators, can be found online

**Percolation(****
network
**

**foreach**
**in**

**while**

**foreach**

**if**
**and**

**return**

Appendix B2 of Additional file

Appendix B3 of Additional file

**Chain_binomial(****
network
**

**foreach**
**in**

**while**

**foreach**
**in**

**while**

**if**

**break**

**else**

**while**

**return**

**Infect_node(****
current_time, node
**

**foreach**

**if**

**if**

Analytic calculations of epidemic and network quantities

Given a degree distribution for a network and a transmissibility for a pathogen, the EpiFire API includes functions that calculate the expected epidemic threshold for the network (the critical transmissibility above which epidemics are possible), the basic reproductive rate of the pathogen in that network (_{
0
}). EpiFire GUI further calculates expected epidemic size under network and mass-action assumptions. All of the network calculations assume the configuration network model, such that the network is a random draw from all randomly connected networks with the specified degree distribution. Calculations, unless otherwise noted, are adapted from Meyers (2007)

The epidemic threshold for a network is a critical transmission probability (along edges) below which outbreaks are expected to fizzle out and above which large epidemics are possible, but not guaranteed. Technically, in an infinite network, outbreaks below the epidemic threshold will reach only a finite number of nodes, while outbreaks above the threshold can either be finite or infect a fraction of the network including an infinite number of nodes. This value is a function of the network structure and corresponds exactly to an _{0} value of 1; given by

where _{
k
} is the fraction of nodes having degree

The expected basic reproductive rate is the expected number of neighbors that will be infected by each infectious node early in an epidemic, and is equal to the ratio of the actual transmissibility to the critical transmissibility, given by

The expected epidemic size is then given by

where

We also provide a function that calculates the expected final epidemic size in a mass-action model, given a value of _{0}

where _{0} is the fraction of individuals who are susceptible at the start of the epidemic. The expected epidemic sizes under both the mass action and network models are solved numerically using the bisection method

By calculating and comparing the network and mass-action expectations for an epidemic size of a specific network-pathogen combination (done automatically in the EpiFire GUI), one can assess the epidemiological impact of the network structure. Large differences in the values of network and mass-action expectations suggest that network structure plays an important role in disease transmission, and that traditional compartmental models may not be adequate.

Since percolation and chain binomial transmissibilities are per-time-unit and per-infectious-period probabilities, respectively, when users switch between simulation types the transmissibility parameter is recalculated accordingly.

One important property of networks is clustering, a measure of whether nodes exist in well-interconnected groups. EpiFire implements the transitivity clustering coefficient

where triangles is the number of sets of nodes A, B, and C such that all three are interconnected, and triples is the number of sets of nodes A’, B’, and C’ such that B’ is connected to A’ and C’.

Results

EpiFire is an applications programming interface (API) implemented in C++, designed to efficiently generate networks with a specified degree distribution, measure fundamental network characteristics, and perform percolation and chain-binomial simulations of SIR disease transmission for generated networks. EpiFire also includes a continuous time, stochastic mass-action simulation class for creating hybrid models and/or comparing the results of mass-action and network-based simulations.

EpiFire allows users to develop efficient epidemic simulations in C++ by providing a high-level API for running simulations and manipulating the underlying contact networks in network-based models. The following examples demonstrate simple use-cases.

Example 1: Percolation simulation (API)

This percolation simulation is performed using a random network constructed using the Erdős-Rényi algorithm with 10,000 nodes and mean degree 5. The probability of transmission between an infected node and a susceptible neighbor is 0.25, and the epidemic begins with 10 infected nodes (selected randomly without replacement).

Sample output:

The output from this example is the expected value of _{0} if an epidemic occurs (see Implementation) and the total number of individuals infected during the epidemic. By running the simulation many times, we can generate a distribution of epidemic sizes. Alternatively, to generate an epidemic curve, we can report the size of the infected cohort after each round of transmission (see Additional file

Example 1 required 0.06 sec (avg) and 5.45 MB (max) of system memory. The test system was a Dell Precision Workstation 490 with two Intel Xeon 5140 processors and 4 GB of RAM running 32-bit Ubuntu 10.04 LTS. EpiFire was compiled using gcc version 4.4.3 with O2 optimization. Most of the time is spent constructing the random network; the simulation itself only requires 0.5 ms (avg). Depending on the intended application, it may be acceptable to generate and reuse a single network for many simulations, greatly reducing the time required. Users should note that when reusing a network,

Additional API examples

Appendix A2 of Additional file

Comparison with NetworkX and igraph

EpiFire is not intended to replace other network APIs, which were developed to solve different problems. To compare these diverse APIs, we consider one of their common functions: generation of an Erdős-Rényi random network. There are several algorithms that will generate random networks; we chose the most efficient algorithm available in each API when generating a 100,000 node network with a Poisson degree distribution with Poisson parameter (mean) equal to ten. EpiFire requires much less memory and running time than the user-friendly NetworkX, and somewhat less memory and time than igraph (Table

Memory required (MB)

Time required (sec)

Performance comparison with NetworkX and igraph, generating random networks with 100,000 nodes, Poisson(10) degree distribution. Network generation in NetworkX was performed using the fast_gnp_random_graph() function.

EpiFire API

45

0.830

NetworkX

1,400

15.4

Igraph

93

0.995

Overview of EpiFire GUI

Although the EpiFire API provides great flexibility for creating custom simulations, it requires some background in programming. As a demonstration of some of the capabilities of the EpiFire API, we present the EpiFire graphical user interface (GUI), which allows users to generate and analyze several common classes of random networks and conduct chain-binomial and percolation SIR simulations on contact networks in a point-and-click environment with intuitive, automatically generated figures. The EpiFire GUI requires no programming to create and analyze networks and run stochastic simulations on those networks.

Main window

The application's main window (Figure

EpiFire GUI main application window

**EpiFire GUI main application window.**

Network parameterization

By default, the tab labeled “Step 1: Choose a network” is active. Users choose whether to import a network from a file or to randomly generate a network. The import format is an edge-list file, with each edge represented as a single line containing the names of the connected nodes, separated by a comma. Currently only undirected networks are supported by the EpiFire GUI. If users choose to generate a network, they may specify the desired number of nodes, the degree distribution type, and relevant parameters for the degree distribution. Generated networks are connected randomly using the configuration model with the constraint that no pair of nodes is connected by more than one edge, and no edges loop back to connect a node to itself. Users may select Poisson, exponential, power law, urban, or fixed degree distributions. Degree distributions are right-truncated at

Simulator parameterization

By clicking the tab labeled “Step 2: Design a simulation,” users may specify simulation parameters. Epidemics can be simulated under chain-binomial and percolation models. Chain-binomial is the default because it produces epidemic curves with finer temporal resolution, although percolation simulations run faster and will produce the same distribution of final epidemic sizes. Both simulators allow users to specify a transmissibility, the number of infections that should start the epidemic, and the number of simulation repetitions that should be performed. Chain-binomial simulations are also parameterized with an infectious period, defined as the number of time-steps an infected individual will remain infected; when this is set to 1, the chain-binomial and percolation models produce equivalent results. Users may also choose whether epidemic data is retained between runs or deleted prior to each new simulation. This determines which data are included in the plots.

Theoretical predictions

The EpiFire GUI also displays the expected _{0} for the current network and epidemic simulation parameters. Epidemics will not occur when _{0} is less than one, but may occur otherwise. Given this expected _{0}, EpiFire calculates expected epidemic sizes under mass-action and configuration model assumptions.

Control panel

The control panel allows users to clear the current network or the current epidemic data from memory, restore the default settings, open the help dialog, generate and load networks, and run a simulation with the specified parameters. Note that when “Generate Network” or “Import Edge List” is clicked, any previous network is automatically cleared, and the “Run Simulation” button is disabled unless a network has been created.

Status log

The status log provides users with updates, including the status of network generation, warnings about incompatible parameters, current simulation number, and final epidemic size.

The right pane of the EpiFire GUI is divided into three plots (initially blank) of simulation results. These plots may be resized by resizing the main window, or by clicking and dragging the horizontal dividers between the top and middle, and the middle and bottom plots. All of the plots created by the EpiFire GUI can be exported by double-clicking the plot, and the data used to generate the plots can be exported by right-clicking.

Node state plot

The top plot shows the progression of states of the first 100 nodes in the network, or all nodes if the network has fewer than 100 nodes. The horizontal axis represents the duration of the epidemic, and each horizontal band represents the states of a particular node. Blue denotes susceptible, red is infectious and yellow is recovered. The range of the horizontal axis is the total duration, in time-steps, of the most recent simulation run. These plots may provide visual insights into synchrony between node states and the relative amount of time nodes spend in each state.

Epidemic curve plot

The middle plot displays the number of individuals in the infectious state at each time step. The most recent epidemic curve (representing the most recent simulation run) is shown in red. If users choose to retain data between simulation runs, then all previous simulations are shown in semi-transparent gray. These gray data points effectively become a density plot, so that after many runs, users can see the range of possible outcomes and what a typical epidemic might look like.

Histogram of epidemic sizes

The bottom plot shows how many times epidemics of a given size class have been observed, where epidemic size is defined as the total number of nodes in the recovered state at the end of the epidemic (when there are no remaining infectious nodes). As more simulation runs are compiled, the histogram of observed epidemic sizes more accurately estimates the distribution of possible epidemic sizes.

Network visualization window

Displaying large networks is difficult, especially those with random connections that are uncorrelated with any two-dimensional location. If networks are small (< 100 nodes), it may, however, be useful to display their structure. We provide a “Show network plot” option within the “Plot” menu, which uses a variant of the Fruchterman-Reingold algorithm

Network analysis window

The “Network” menu includes a “Network analysis” option. If there is a network in memory, a new pop-up window (Figure

"Analysis of current network" dialog

**"Analysis of current network" dialog.**

The network analysis window is particularly useful for comparing networks with different degree distributions, and for elucidating unexpected simulation results. For example, a simulation with a very high expected _{0} may fail to create correspondingly large epidemics if the underlying network has multiple components.

Simulation results analysis window

Under the “Results” menu is the “Simulation results analysis” option. Once results have been generated, users can open a new window (Figure

"Analysis of simulation results" dialog

**"Analysis of simulation results" dialog.**

Example 2: Percolation simulation (GUI)

The simulation in Example 1 can also be performed using the GUI according to the instructions below. Default settings are assumed unless indicated.

1. Click “Generate Network” or press CTRL-G to create a 10,000 node network with a Poisson degree distribution with expected mean degree equal to 5 (these are the default settings on the “Choose a network” tab).

2. Click the “Design a simulation” tab or press ALT-2

i. Change “Simulation type” to Percolation

ii. Change “Transmissibility” to 0.25

iii. Change “Initially infected” to 10

3. Click “Run Simulation” or press ENTER to run the simulation.

The epidemic size will be printed in the log window in the lower left. Figures characterizing the simulation run will be automatically generated on the right, including a node-state plot, an epidemic curve plot, and an epidemic size histogram. The epidemic size histogram will better approximate the true final size distribution as additional simulations are performed.

Discussion

Developing epidemiological simulations that scale effectively to millions of individuals can be challenging. The open source API of EpiFire provides a transparent, logical framework that can be used for standard percolation, chain-binomial, or mass-action SIR simulations. Furthermore, it can be extended to create new, specialized types of simulations, such as networks that change in response to epidemic dynamics, or multi-pathogen simulations where co-infection changes transmission probabilities. EpiFire allows for hybridized models and alternative network interpretations, such as using a mass-action model for within-city dynamics and a network model for between-city dynamics

Several other publically-available software projects have overlapping functionality. However, none have been written specifically for contact network epidemiology with the intent of providing a common, extensible toolkit for researchers to use to develop their own models.

Although EpiFire is intended as an API for contact network epidemiology, the network class is independent from the simulation classes, and is thus applicable to other types of network-based modeling, such as metabolite interaction networks

The EpiFire graphical interface provides a user-friendly toolkit for performing network-based SIR epidemic simulations and gaining an intuitive understanding of the impact of network structure on infectious disease dynamics. The most obvious applications are pedagogical: the straight-forward interface and rapid feedback allow users to learn first-hand the consequences of changing epidemic and network parameters. EpiFire has been used in courses at the University of Texas at Austin and at the Summer Institute in Statistics and Modeling in Infectious Diseases (SISMID) at the University of Washington

We are currently adding support for deterministic, ordinary differential equation models, which will include derived classes implementing the standard mass-action SIR model, and a network-based SIR model

Conclusions

Efficient and easy-to-use software plays a critical role in computational biology research. Contact network approaches in epidemiology provide sophisticated analytical and efficient computational methods, but these can be technically challenging and time consuming to implement. Currently no open-source toolkit is available for facilitating contact network epidemiology research. We present EpiFire, an applications programming and graphical interface, available for Windows, OS X, and Linux online at sourceforge.net/projects/epifire. As the field of contact network epidemiology matures, so should its mathematical and computational toolkit. Open-source code libraries like EpiFire help to avoid programming mistakes, increase the transparency of analyses, and reduce barriers between the conception and implementation of ideas.

Availability and requirements

Project name: EpiFire

Project home page:

Operating systems: Platform independent

Programming language: C++

Other requirements: g++ 4.5 for API; g++ 4.5 and Qt 4.7 for GUI

License: GNU GPLv3

Any restrictions for use by non-academics: none

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

TH conceived of and implemented the project and drafted the manuscript. EM assisted in developing the programmatic and graphical interfaces. LAB developed a beta version of the graphical interface. LAM helped to implement analytical methods, refine the graphical interface and draft the manuscript. AG helped to draft the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We wish to thank Erik Volz for suggesting the development of a graphical interface, and Claus Wilke for suggesting ways to make the software more user-friendly.

Funding was provided by National Institute of General Medical Sciences MIDAS grant U01GM087719 and National Science Foundation grant DEB-0749097 to LAM. This material is based in part upon work supported by the National Science Foundation under Cooperative Agreement No. DBI-0939454. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.