Abstract
Background
Contact network models have become increasingly common in epidemiology, but we lack a flexible programming framework for the generation and analysis of epidemiological contact networks and for the simulation of disease transmission through such networks.
Results
Here we present EpiFire, an applications programming interface and graphical user interface implemented in C++, which includes a fast and efficient library for generating, analyzing and manipulating networks. Networkbased percolation and chainbinomial simulations of susceptibleinfectedrecovered disease transmission, as well as traditional nonnetwork massaction simulations, can be performed using EpiFire.
Conclusions
EpiFire provides an opensource programming interface for the rapid development of network models with a focus in contact network epidemiology. EpiFire also provides a pointandclick interface for generating networks, conducting epidemic simulations, and creating figures. This interface is particularly useful as a pedagogical tool.
Background
Epidemiological models traditionally assume massaction dynamics: individual hosts in a population have identical contact rates and are wellmixed, such that any given pair may interact and transmit disease with equal probability [1,2]. Compartmental susceptibleinfectedrecovered (SIR) models implicitly assume massaction interactions for infinite populations. However, the massaction assumption is unlikely to be strictly valid in most instances. Although there are some settings in which massaction models provide reasonable approximations, there are others in which it is essential to consider the heterogeneous contact patterns that underlie disease transmission [3,4].
Contact network epidemiology [5,6] explicitly models disease transmission through populations with heterogeneous contact patterns. Host populations are represented as networks of individuals (the nodes) and the contacts through which disease can spread (the edges between nodes). The definition of a diseasecausing contact depends on the disease. For influenza, edges represent the potential for droplet or contact transmission, e.g., direct interactions such as prolonged proximity or food sharing [7], or indirect transmission of fomites persisting in the environment [8]. A node's degree is the number of contacts an individual has, and is one indication of an individual’s epidemiological importance, relative to other individuals in the population. The degree distribution of a network can play a critical role in shaping epidemic dynamics [4]. By modeling disease transmission probabilistically, contact network models can predict the expected epidemic size, the likelihood that an epidemic will occur given an introduction, and with some methods, the dynamics of an outbreak and likely chains of transmission through the population [9].
The 2003 SARS outbreak in China illustrates one limitation of massaction models. Early estimates of the basic reproduction number (R_{0}) for SARS predicted that 120 days after the disease was introduced in China, between 30,000 and 10 million individuals would have been infected in China [4]. In fact, only 792 SARS cases were reported [10]. This discrepancy stems, at least in part, from the assumption that disease transmission patterns for the entire Chinese population would be similar to the transmission patterns in the apartment building in Hong Kong and hospital in Singapore from which the early estimates were made. Such heterogeneity in contact patterns lends itself to a network approach, in which nodes can be connected to explicitly represent the diverse patterns of interactions that occur within human populations.
The spread of sexually transmitted infections is also often modeled more effectively with contact networks than massaction models, because of nonrandomness and heterogeneity in sexual contact patterns. An infected individual with many contacts may be more likely to spark a significant outbreak than an infected individual with fewer contacts. A study of 2,810 randomly sampled Swedes aged 1874 years found that the number of sexual interactions per person approximately followed a powerlaw distribution, indicating significant heterogeneity in sexual activity [11]. A smaller study of sexual interactions among adolescents in a Midwest US town revealed long chains of contacts and fewer cycles (edges forming polygons) than would be expected by chance, patterns that can be readily captured by network models [12].
Contact network models also allow straightforward analyses of epidemiological dynamics and intervention events occurring at particular nodes. For example, the approach has been used to optimize the distribution of limited resources such as vaccines and antiviral drugs within a population [1315], and to identify critical bridge groups connecting relatively disjoint parts of a highrisk population [16].
Although there are powerful mathematical methods for analytically estimating the dynamics of epidemics in complex contact networks [5,6], simulations of disease transmission through networks are also critical to the field. They allow us to test the validity of mathematical approximations and serve as the primary modeling approach when mathematical approximations cannot adequately describe the complexity of a population. There are two widelyused approaches to simulating disease transmission through networks: the ReedFrost chainbinomial model [17,18] and the percolation model [19]. Although they can be made arbitrarily complex, these models typically represent individual hosts as nodes with discrete disease states, and propagate infection along edges from infected to susceptible nodes according to the probability of disease transmission, called transmissibility, which may either be fixed or determined by a function.
In the chainbinomial model, time is measured in arbitrary, discrete units. Simulations are parameterized with the number of time steps individuals remain infected, and the pertimestep transmissibility. The chainbinomial model may be used to generate epidemic curves (i.e., incidence time series data), identify chains of transmission, estimate the probability of an epidemic, and assess the distribution of epidemic sizes.
Percolation simulations give each infected individual one opportunity to spread disease to each of their susceptible contacts, but do so in an arbitrary, nonchronological order. One approach is to use a probabilistic breadthfirst traversal of the network. In the simplest case, transmissibility is the same for all edges. One approach is to maintain four dynamic lists of nodes: (a) susceptible, (b) newlyinfected, (c) currentlyinfected, and (d) recovered. At the beginning of a typical simulation, one or more nodes will be placed on the currentlyinfected list, and the remaining nodes will be on the susceptible list. Then the following twostep procedure is repeated until the currentlyinfected list is empty.
(1) For each currentlyinfected node, a uniform random number between zero and one is generated for each edge that connects the node to a susceptible neighbor. If and only if the random number for a given edge is less than the transmissibility, then the susceptible neighbor moves from the susceptible list to the newlyinfected list.
(2) After all currentlyinfected nodes have been tested in this way for transmission, the nodes in the currentlyinfected list are moved to the recovered list and the nodes in the newlyinfected list are moved to the currentlyinfected list.
A breadthfirst approach like this will result in a coarse approximation of the epidemic curve, similar to a chainbinomial simulation with an infectious period of one unit. Percolation simulations tend not to predict realistic chains of transmission, since new infections occur in cohorts, and the order of transmission events is based on arbitrary orderings of each cohort. The percolation algorithm above can be modified slightly to simulate a chainbinomial model: rather than testing each edge from an infected to a susceptible node once, we test each edge n times, where n is the length of the infectious period or the number of time steps until transmission occurs, whichever occurs first. If parameterized appropriately, the two models will yield the same epidemic probability and final size distribution. The chain binomial model yields smoother, more realistic epidemic curves, while the percolation model is more computationally efficient.
While the field of contact network epidemiology is growing rapidly, it still lacks a flexible, userfriendly programming toolkit for generating contact networks, analyzing their structure and simulating the spread of disease through them. There are a few freelyavailable libraries for simulating and analyzing networks, but they are suboptimal for epidemiological research, particularly for novice programmers. Specifically, NetworkX [20], implemented in Python, is straightforward but slow, whereas igraph [21], implemented in C, is faster but less userfriendly. The R package statnet [22] is more specialized, focusing on statistical analysis of exponentialfamily random graphs. None of these packages provides epidemiological simulations or functions for calculating important epidemiological values. Other software packages provide valuable disease or populationspecific simulators (e.g., for pertussis [23], HIV [24], influenza [25,26], urban populations [27], metapopulation networks [28]), but lack a flexible framework for users to define alternative disease models and population structures.
Here, we introduce EpiFire, an applications programming interface (API) implemented in C++. EpiFire includes a fast and efficient library for generating networks with a specified degree distribution, measuring fundamental network characteristics, and performing percolation and chainbinomial simulations of SIR (susceptibleinfectedrecovered) disease transmission on generated networks. We have also developed a userfriendly interface that allows the user to perform these functions in a pointandclick environment and provides intuitive graphical results of epidemic simulations.
Although network models can be made to approximate massaction models by assuming a completely connected network [29], it typically does not make sense to do so. Massaction models are computationally very efficient and network models become computationally more demanding as the number of nodes and edges increases. Thus, EpiFire also includes a continuous time, stochastic massaction simulation class to allow users to create hybrid models or to compare the results of massaction and networkbased models.
Implementation
EpiFire comprises two bodies of code that are written in objectoriented C++: the applications programming interface (API) and the graphical user interface (GUI). The EpiFire GUI was developed using the API and Qt [30], and allows nonprogrammers to generate networks, perform epidemic simulations, and export figures and data. We describe the EpiFire GUI in more detail in the Results section below. The entire EpiFire code base is open source, licensed under GNU GPLv3.
The EpiFire API consists of 20 classes and 2,500 lines of nonwhitespace code. The EpiFire GUI consists of 12 classes and 3,500 lines of nonwhitespace code.
Installation
EpiFire source code is available from GitHub at http://github.com/tjhladish/EpiFire/ webcite or http://epifire.com webcite. Users who have installed Git version control software (opensource, available at http://gitscm.com/ webcite) may create a local copy of the EpiFire repository by executing, without quotes, "git clone git://github.com/tjhladish/EpiFire.git" on the command line. Microsoft Windows and Mac OS X users can download precompiled binaries from http://sourceforge.net/projects/epifire/. webcite
EpiFire API
Functionally, the EpiFire API consists of tools for network generation, network manipulation, network characterization, and epidemic simulation. Programmatically, the API is divided into network, node, edge, and simulation classes. Each class defines a type of variable and its associated attributes and functions. For example, the network class allows users to define a network variable, which can contain one or more node variables that can be connected by one or more edge variables. In the small program below, an undirected network called my_network is created, and then populated with 100 nodes. The nodes are randomly connected with edges such that on average, each node will be connected to five others. Finally, the structure of the network is written out as an edgelist in the commaseparatedvalue format.
#include < Network.h>
int main() {
Network my_network("example network", Network::Undirected );
my_network.populate(100);
my_network.rand_connect_poisson(5);
my_network.write_edgelist("output.csv");
return 0;
}
The network constructor takes two arguments: an arbitrary text string naming the network, and either Network::Undirected or Network::Directed, which specifies whether all edges are undirected, or some or all may be directional.
Each time the program is run, a different randomly connected graph will be produced. The following is an example of the beginning of the output file:
0,97
0,17
0,21
1
2,49
2,51
2,36
2,73
2,66
3,45
In this case, Node 0 was connected to Nodes 97, 17, and 21. Node 1 was not connected to any others.
More sophisticated examples, including networks being used in epidemic simulations, can be found in Additional file 1 in the examples directory provided with the source code.
Additional file 1. Appendices A and B.
Format: PDF Size: 566KB Download file
This file can be viewed with: Adobe Acrobat Reader
The network modeling portions of the code (the Network, Node, and Edge classes) can be used with or without the epidemiological code, and may therefore be useful for nonepidemiology applications. The simulation classes provided include three types of finite, stochastic epidemic simulations: percolation and chainbinomial (both networkbased), and massaction. Users may use the provided simulation classes or may create derived classes based on them. For example, the base class for percolation simulations, called Percolation_Sim assumes a disease with susceptibleinfectiousrecovered states. A simple derived simulation class can be created that inherits almost all the functionality of Percolation_Sim, but that uses an alternate progression of states. An example of a derived simulation using the susceptibleexposedinfectiousrecovered state progression (SEIR_Percolation_Sim.h) can be found in the research directory provided with the source code.
Networks may be constructed explicitly by reading in an edgelist file, or adding individual nodes and specifying their connections. Networks can also be constructed implicitly by using one of the network generators provided. Generators for ring and square lattice networks are provided, as well as three random network generators: the ErdősRényi model [31], resulting in approximately Poisson degree distributions, the configuration model [32] that generates random networks with a userspecified degree distribution, 'and the WattsStrogatz “smallworld” network generation model [33].
Networks that are generated via the configuration model can contain edges that are usually undesirable in epidemiological models. Pairs of nodes may be randomly connected by two or more edges, and nodes may be “connected” to themselves by edges going to and from the same node. These edges, called parallel edges and selfloops respectively, may be removed using the provided “loseloops” function (Additional file 1: Appendix B). This function uses a novel algorithm to reconnect the affected edges in a randomized way that preserves the degree sequence of the network. This approach may introduce some nonrandomness to the network structure, but the improvement in algorithmic complexity over competing methods is significant [34].
Random numbers are generated using the Mersenne Twister algorithm [35] as implemented by Wagner, available at http://wwwpersonal.umich.edu/~wagnerr/MersenneTwister.html webcite.
Percolation and chainbinomial pseudocode
EpiFire provides epidemic simulators using the percolation and chainbinomial models, represented as pseudocode below. Both pseudocode functions take a network as an argument and return final epidemic size. The most recent implementations, including additional functions for the simulators, can be found online [36,37]. In the percolation pseudocode below, T denotes the transmissibility of the pathogen, that is, the probability that transmission will occur between an infectious node and a susceptible neighbor.
Percolation(network,T):
infected_queue ← empty list
foreachnodeinnetwork:
set state ofnodeto "susceptible"
first_infected ← random node fromnetwork
set state offirst_infectedto "infectious"
appendfirst_infectedtoinfected_queue
whileinfected_queueis not empty:
node ← remove first element frominfected_queue
foreachneighborofnode:
rand ← uniform random number between 0 and 1
ifneighboris "susceptible"andrand < T:
set state ofneighborto "infectious"
appendneighbortoinfected_queue
set state ofnodeto "recovered"
epidemic_size ← count of nodes in "recovered" state
returnepidemic_size
Appendix B2 of Additional file 1 provides a version of the percolation algorithm that produces an epidemic curve. In practice, it may be convenient to use integers as node states rather than text strings. In the chain binomial algorithm below, susceptible nodes have a value of 0, recovered nodes have a value of 1, and infectious nodes have a value equal to the number of days they have been infectious.
Appendix B3 of Additional file 1 provides a simple chain binomial function that performs one comparison per time unit per infectious node. Here, we describe a more efficient implementation. Instead of checking whether transmission occurs to each neighbor at each time step, we can determine the time until transmission along each edge. Because each transmission attempt can be considered a Bernoulli trial, we can determine when transmission will occur by sampling from a truncated geometric distribution with probability of "success" T_cb (chain binomial transmissibility) and support on {1, 2, . . . , gamma + 1}, where gamma is the infectious period. If the deviate happens to be gamma + 1, then transmission never occurs. In the pseudocode below, transQ is a priority queue of transmission events, sorted by time, least to greatest.
Chain_binomial(network,T_cb, gamma):
transQ ← empty priority queue of [time, node] pairs
infected_list ← empty list
foreachnodeinnetwork:
set state ofnodeto 0
current_time ← 0
first_infected ← random node fromnetwork
Infect_node(current_time,first_infected)
whileinfected_listis not empty:
foreachnodeininfected_list:
increment state ofnode
whileinfected_listis not empty:
ifstate ofinfected_list[0] ≤ gamma:
break
else:
set state ofinfected_list[0] to 1
remove first element frominfected_list
whiletransQis not empty and time oftransQ[0] ≤ time:
event ← transQ[0]
Infect_node(time ofevent, node ofevent,T_cb,gamma,transQ,infected_list)
epidemic_size ← count of nodes in 1 state
returnepidemic_size
Infect_node(current_time, node,T_cb,gamma, transQ, infected_list):
set state ofnodeto 1
appendnodetoinfected_list
foreachneighborofnode:
ifstate ofneighboris 0:
rand ← geometric_random_number(T_cb,gamma), see main text
ifrand ≤ gamma:
append [current_time + rand,neighbor] totransQ
Analytic calculations of epidemic and network quantities
Given a degree distribution for a network and a transmissibility for a pathogen, the EpiFire API includes functions that calculate the expected epidemic threshold for the network (the critical transmissibility above which epidemics are possible), the basic reproductive rate of the pathogen in that network (R_{0}). EpiFire GUI further calculates expected epidemic size under network and massaction assumptions. All of the network calculations assume the configuration network model, such that the network is a random draw from all randomly connected networks with the specified degree distribution. Calculations, unless otherwise noted, are adapted from Meyers (2007) [6], which provides additional mathematical details.
The epidemic threshold for a network is a critical transmission probability (along edges) below which outbreaks are expected to fizzle out and above which large epidemics are possible, but not guaranteed. Technically, in an infinite network, outbreaks below the epidemic threshold will reach only a finite number of nodes, while outbreaks above the threshold can either be finite or infect a fraction of the network including an infinite number of nodes. This value is a function of the network structure and corresponds exactly to an R_{0} value of 1; given by
where k is the degree of a node, and p_{k} is the fraction of nodes having degree k.
The expected basic reproductive rate is the expected number of neighbors that will be infected by each infectious node early in an epidemic, and is equal to the ratio of the actual transmissibility to the critical transmissibility, given by
The expected epidemic size is then given by
where u is the solution to the selfconsistency equation
We also provide a function that calculates the expected final epidemic size in a massaction model, given a value of R_{0}[1]:
where S_{0} is the fraction of individuals who are susceptible at the start of the epidemic. The expected epidemic sizes under both the mass action and network models are solved numerically using the bisection method [38].
By calculating and comparing the network and massaction expectations for an epidemic size of a specific networkpathogen combination (done automatically in the EpiFire GUI), one can assess the epidemiological impact of the network structure. Large differences in the values of network and massaction expectations suggest that network structure plays an important role in disease transmission, and that traditional compartmental models may not be adequate.
Since percolation and chain binomial transmissibilities are pertimeunit and perinfectiousperiod probabilities, respectively, when users switch between simulation types the transmissibility parameter is recalculated accordingly.
One important property of networks is clustering, a measure of whether nodes exist in wellinterconnected groups. EpiFire implements the transitivity clustering coefficient [39], calculated as
where triangles is the number of sets of nodes A, B, and C such that all three are interconnected, and triples is the number of sets of nodes A’, B’, and C’ such that B’ is connected to A’ and C’.
Results
EpiFire is an applications programming interface (API) implemented in C++, designed to efficiently generate networks with a specified degree distribution, measure fundamental network characteristics, and perform percolation and chainbinomial simulations of SIR disease transmission for generated networks. EpiFire also includes a continuous time, stochastic massaction simulation class for creating hybrid models and/or comparing the results of massaction and networkbased simulations.
EpiFire allows users to develop efficient epidemic simulations in C++ by providing a highlevel API for running simulations and manipulating the underlying contact networks in networkbased models. The following examples demonstrate simple usecases.
Example 1: Percolation simulation (API)
This percolation simulation is performed using a random network constructed using the ErdősRényi algorithm with 10,000 nodes and mean degree 5. The probability of transmission between an infected node and a susceptible neighbor is 0.25, and the epidemic begins with 10 infected nodes (selected randomly without replacement).
#include < Percolation_Sim.h>
int main() {
// Construct Network
Network net("example net", Network::Undirected);
net.populate(10000);
net.fast_random_graph(5);
// Parameterize and run simulation
Percolation_Sim sim(&net);
sim.set_transmissibility(0.25);
sim.rand_infect(10);
cout < < "Expected R0: " < < sim.expected_R0() < < endl;
sim.run_simulation();
cout < < "Epidemic size: " < < sim.epidemic_size() < < endl;
}
Sample output:
Expected R0: 1.25551
Epidemic size: 3423
The output from this example is the expected value of R_{0} if an epidemic occurs (see Implementation) and the total number of individuals infected during the epidemic. By running the simulation many times, we can generate a distribution of epidemic sizes. Alternatively, to generate an epidemic curve, we can report the size of the infected cohort after each round of transmission (see Additional file 1: Appendix A).
Example 1 required 0.06 sec (avg) and 5.45 MB (max) of system memory. The test system was a Dell Precision Workstation 490 with two Intel Xeon 5140 processors and 4 GB of RAM running 32bit Ubuntu 10.04 LTS. EpiFire was compiled using gcc version 4.4.3 with O2 optimization. Most of the time is spent constructing the random network; the simulation itself only requires 0.5 ms (avg). Depending on the intended application, it may be acceptable to generate and reuse a single network for many simulations, greatly reducing the time required. Users should note that when reusing a network, sim.reset() should be called in between simulations to reset the state of all nodes to the default susceptible state, as in Appendix A3 of Additional file 1. The running time required for Example 1 scales linearly with the expected epidemic size, whereas the memory required is linear with N * (k + 1) where N is the network size and k is the mean degree.
Additional API examples
Appendix A2 of Additional file 1 includes a more complicated simulation of an epidemic on a dynamic network. Further examples included with the source code are a chainbinomial simulation of a network with an arbitrary degree distribution; a derived percolation class that uses a susceptibleexposedinfectiousrecovered progression of states; and a stochastic, continuous time, massaction simulation using a Gillespie algorithm [40,41].
Comparison with NetworkX and igraph
EpiFire is not intended to replace other network APIs, which were developed to solve different problems. To compare these diverse APIs, we consider one of their common functions: generation of an ErdősRényi random network. There are several algorithms that will generate random networks; we chose the most efficient algorithm available in each API when generating a 100,000 node network with a Poisson degree distribution with Poisson parameter (mean) equal to ten. EpiFire requires much less memory and running time than the userfriendly NetworkX, and somewhat less memory and time than igraph (Table 1). The comparatively poor performance of NetworkX is likely due primarily to differences in efficiency between C++ and Python.
Table 1. Comparison with NetworkX and igraph
Overview of EpiFire GUI
Although the EpiFire API provides great flexibility for creating custom simulations, it requires some background in programming. As a demonstration of some of the capabilities of the EpiFire API, we present the EpiFire graphical user interface (GUI), which allows users to generate and analyze several common classes of random networks and conduct chainbinomial and percolation SIR simulations on contact networks in a pointandclick environment with intuitive, automatically generated figures. The EpiFire GUI requires no programming to create and analyze networks and run stochastic simulations on those networks.
Main window
The application's main window (Figure 1) is organized in two panes, with model parameters and application status on the left, and automatically generated plots of simulation data on the right. The lefthand pane is divided from top to bottom, as follows:
Figure 1. EpiFire GUI main application window.
Network parameterization
By default, the tab labeled “Step 1: Choose a network” is active. Users choose whether to import a network from a file or to randomly generate a network. The import format is an edgelist file, with each edge represented as a single line containing the names of the connected nodes, separated by a comma. Currently only undirected networks are supported by the EpiFire GUI. If users choose to generate a network, they may specify the desired number of nodes, the degree distribution type, and relevant parameters for the degree distribution. Generated networks are connected randomly using the configuration model with the constraint that no pair of nodes is connected by more than one edge, and no edges loop back to connect a node to itself. Users may select Poisson, exponential, power law, urban, or fixed degree distributions. Degree distributions are righttruncated at n – 1, where n is the size of the network. Exponential and power law distributions are also lefttruncated so that there are no nodes with degree zero. The urban degree distribution is a semiempirical distribution used previously to study the spread of SARS and influenza in Vancouver, Canada [4,13,14].
Simulator parameterization
By clicking the tab labeled “Step 2: Design a simulation,” users may specify simulation parameters. Epidemics can be simulated under chainbinomial and percolation models. Chainbinomial is the default because it produces epidemic curves with finer temporal resolution, although percolation simulations run faster and will produce the same distribution of final epidemic sizes. Both simulators allow users to specify a transmissibility, the number of infections that should start the epidemic, and the number of simulation repetitions that should be performed. Chainbinomial simulations are also parameterized with an infectious period, defined as the number of timesteps an infected individual will remain infected; when this is set to 1, the chainbinomial and percolation models produce equivalent results. Users may also choose whether epidemic data is retained between runs or deleted prior to each new simulation. This determines which data are included in the plots.
Theoretical predictions
The EpiFire GUI also displays the expected R_{0} for the current network and epidemic simulation parameters. Epidemics will not occur when R_{0} is less than one, but may occur otherwise. Given this expected R_{0}, EpiFire calculates expected epidemic sizes under massaction and configuration model assumptions.
Control panel
The control panel allows users to clear the current network or the current epidemic data from memory, restore the default settings, open the help dialog, generate and load networks, and run a simulation with the specified parameters. Note that when “Generate Network” or “Import Edge List” is clicked, any previous network is automatically cleared, and the “Run Simulation” button is disabled unless a network has been created.
Status log
The status log provides users with updates, including the status of network generation, warnings about incompatible parameters, current simulation number, and final epidemic size.
The right pane of the EpiFire GUI is divided into three plots (initially blank) of simulation results. These plots may be resized by resizing the main window, or by clicking and dragging the horizontal dividers between the top and middle, and the middle and bottom plots. All of the plots created by the EpiFire GUI can be exported by doubleclicking the plot, and the data used to generate the plots can be exported by rightclicking.
Node state plot
The top plot shows the progression of states of the first 100 nodes in the network, or all nodes if the network has fewer than 100 nodes. The horizontal axis represents the duration of the epidemic, and each horizontal band represents the states of a particular node. Blue denotes susceptible, red is infectious and yellow is recovered. The range of the horizontal axis is the total duration, in timesteps, of the most recent simulation run. These plots may provide visual insights into synchrony between node states and the relative amount of time nodes spend in each state.
Epidemic curve plot
The middle plot displays the number of individuals in the infectious state at each time step. The most recent epidemic curve (representing the most recent simulation run) is shown in red. If users choose to retain data between simulation runs, then all previous simulations are shown in semitransparent gray. These gray data points effectively become a density plot, so that after many runs, users can see the range of possible outcomes and what a typical epidemic might look like.
Histogram of epidemic sizes
The bottom plot shows how many times epidemics of a given size class have been observed, where epidemic size is defined as the total number of nodes in the recovered state at the end of the epidemic (when there are no remaining infectious nodes). As more simulation runs are compiled, the histogram of observed epidemic sizes more accurately estimates the distribution of possible epidemic sizes.
Network visualization window
Displaying large networks is difficult, especially those with random connections that are uncorrelated with any twodimensional location. If networks are small (< 100 nodes), it may, however, be useful to display their structure. We provide a “Show network plot” option within the “Plot” menu, which uses a variant of the FruchtermanReingold algorithm [42] to plot nodes and edges in a popup window. Our algorithm deviates from the classical FruchtermanReingold by preferentially placing highdegree nodes near the center of the plot, rather than starting with a uniform distribution of nodes. This plot is dynamic, allowing users to explore or improve the plot by clickinganddragging nodes to new locations. Users may zoom in and out using the +/ keys, respectively. The network plot option is disabled for networks with more than 500 nodes due to the complexity of the algorithm used.
Network analysis window
The “Network” menu includes a “Network analysis” option. If there is a network in memory, a new popup window (Figure 2) appears with the node count, edge count, mean degree, and a histogram of the degree distribution. By clicking on the “Calculate” buttons, the user can determine the number of nodes in the largest component, number of components, transitivity clustering coefficient, diameter of the largest component, and mean shortest path in the largest component. Note that the last two calculations are computationally demanding and can take much longer to complete than the others. In some cases, calculating one statistic involves first calculating another. In this situation, all calculated statistics will be shown, even if the user did not click “Calculate” for each of the statistics.
Figure 2. "Analysis of current network" dialog.
The network analysis window is particularly useful for comparing networks with different degree distributions, and for elucidating unexpected simulation results. For example, a simulation with a very high expected R_{0} may fail to create correspondingly large epidemics if the underlying network has multiple components.
Simulation results analysis window
Under the “Results” menu is the “Simulation results analysis” option. Once results have been generated, users can open a new window (Figure 3) that automatically calculates basic statistics about the distribution of final epidemic sizes, including minimum, maximum, arithmetic mean, and standard deviation. Because final size distributions are commonly bimodal with the smaller mode corresponding to failed epidemics (called “outbreaks” in the analysis window) and the larger mode corresponding to actual epidemics, these statistics are also calculated separately for the two modes. The EpiFire GUI attempts to heuristically distinguish outbreaks from actual epidemics by checking to see if there is a single large range separating two clusters of data. If such a range exists, the middle of the range is used as the “outbreak/epidemic threshold,” which users may always change to a different value. The epidemic size distribution plot shows values below the threshold in yellow, and those equal to or above the threshold in red. Users may customize the plot by specifying the number of bins and the minimum and maximum values to use on the horizontal axis.
Figure 3. "Analysis of simulation results" dialog.
Example 2: Percolation simulation (GUI)
The simulation in Example 1 can also be performed using the GUI according to the instructions below. Default settings are assumed unless indicated.
1. Click “Generate Network” or press CTRLG to create a 10,000 node network with a Poisson degree distribution with expected mean degree equal to 5 (these are the default settings on the “Choose a network” tab).
2. Click the “Design a simulation” tab or press ALT2
i. Change “Simulation type” to Percolation
ii. Change “Transmissibility” to 0.25
iii. Change “Initially infected” to 10
3. Click “Run Simulation” or press ENTER to run the simulation.
The epidemic size will be printed in the log window in the lower left. Figures characterizing the simulation run will be automatically generated on the right, including a nodestate plot, an epidemic curve plot, and an epidemic size histogram. The epidemic size histogram will better approximate the true final size distribution as additional simulations are performed.
Discussion
Developing epidemiological simulations that scale effectively to millions of individuals can be challenging. The open source API of EpiFire provides a transparent, logical framework that can be used for standard percolation, chainbinomial, or massaction SIR simulations. Furthermore, it can be extended to create new, specialized types of simulations, such as networks that change in response to epidemic dynamics, or multipathogen simulations where coinfection changes transmission probabilities. EpiFire allows for hybridized models and alternative network interpretations, such as using a massaction model for withincity dynamics and a network model for betweencity dynamics [15].
Several other publicallyavailable software projects have overlapping functionality. However, none have been written specifically for contact network epidemiology with the intent of providing a common, extensible toolkit for researchers to use to develop their own models.
Although EpiFire is intended as an API for contact network epidemiology, the network class is independent from the simulation classes, and is thus applicable to other types of networkbased modeling, such as metabolite interaction networks [43] and animal migration between habitats [44].
The EpiFire graphical interface provides a userfriendly toolkit for performing networkbased SIR epidemic simulations and gaining an intuitive understanding of the impact of network structure on infectious disease dynamics. The most obvious applications are pedagogical: the straightforward interface and rapid feedback allow users to learn firsthand the consequences of changing epidemic and network parameters. EpiFire has been used in courses at the University of Texas at Austin and at the Summer Institute in Statistics and Modeling in Infectious Diseases (SISMID) at the University of Washington [45]. EpiFire GUI may be particularly useful to researchers during initial epidemiological explorations of a new contact network because of the ease with which it generates figures and network statistics.
We are currently adding support for deterministic, ordinary differential equation models, which will include derived classes implementing the standard massaction SIR model, and a networkbased SIR model [46,47]. The stochastic, continuous time massaction model that EpiFire currently provides in MassAction_Sim.h will likely be refactored into a Gillespie model base class and massaction and network derived classes. Finally, although the EpiFire simulators can be extended beyond SIR epidemic models (e.g., see [48] for the susceptibleexposedinfectedrecovered simulator), we would like to provide a generic interface for specifying an arbitrary diseasestate progression.
Conclusions
Efficient and easytouse software plays a critical role in computational biology research. Contact network approaches in epidemiology provide sophisticated analytical and efficient computational methods, but these can be technically challenging and time consuming to implement. Currently no opensource toolkit is available for facilitating contact network epidemiology research. We present EpiFire, an applications programming and graphical interface, available for Windows, OS X, and Linux online at sourceforge.net/projects/epifire. As the field of contact network epidemiology matures, so should its mathematical and computational toolkit. Opensource code libraries like EpiFire help to avoid programming mistakes, increase the transparency of analyses, and reduce barriers between the conception and implementation of ideas.
Availability and requirements
Project name: EpiFire
Project home page: https://github.com/tjhladish/EpiFire/wiki/ webcite for source code, http://sourceforge.net/projects/epifire/ webcite for binary installers
Operating systems: Platform independent
Programming language: C++
Other requirements: g++ 4.5 for API; g++ 4.5 and Qt 4.7 for GUI
License: GNU GPLv3
Any restrictions for use by nonacademics: none
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
TH conceived of and implemented the project and drafted the manuscript. EM assisted in developing the programmatic and graphical interfaces. LAB developed a beta version of the graphical interface. LAM helped to implement analytical methods, refine the graphical interface and draft the manuscript. AG helped to draft the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We wish to thank Erik Volz for suggesting the development of a graphical interface, and Claus Wilke for suggesting ways to make the software more userfriendly.
Funding was provided by National Institute of General Medical Sciences MIDAS grant U01GM087719 and National Science Foundation grant DEB0749097 to LAM. This material is based in part upon work supported by the National Science Foundation under Cooperative Agreement No. DBI0939454. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
References

Kermack WO, McKendrick AG: Contributions to the mathematical theory of epidemics–I. 1927.
Bull Math Biol 1991, 53:3355. PubMed Abstract

Anderson RM, May RM: Infectious diseases of humans. Oxford University Press; 1991.

Rahmandad H, Sterman J: Heterogeneity and network structure in the dynamics of diffusion: Comparing agentbased and differential equation models.
Manag Sci 2008, 54:9981014. Publisher Full Text

Meyers LA, Pourbohloul B, Newman MEJ, Skowronski DM, Brunham RC: Network theory and SARS: predicting outbreak diversity.
J Theor Biol 2005, 232:7181. PubMed Abstract  Publisher Full Text

Meyers LA: Contact network epidemiology: Bond percolation applied to infectious disease prediction and control.

Bridges CB, Kuehnert MJ, Hall CB: Transmission of influenza: implications for control in health care settings.
Clin Infect Dis 2003, 37:10941101. PubMed Abstract  Publisher Full Text

Li S, Eisenberg JNS, Spicknall IH, Koopman JS: Dynamics and Control of Infections Transmitted From Person to Person Through the Environment.
Am J Epidemiol 2009, 170:257265. PubMed Abstract  Publisher Full Text

Volz E, Meyers LA: Susceptibleinfectedrecovered epidemics in dynamic contact networks.
Proc Biol Sci 2007, 274:29252933. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

WHO  Cumulative Number of Reported Cases (SARS) http://www.who.int/csr/sars/country/2003_03_26/en/index.html webcite

Liljeros F, Edling CR, Amaral LA, Stanley HE, Aberg Y: The web of human sexual contacts.
Nature 2001, 411:907908. PubMed Abstract  Publisher Full Text

Bearman P, Moody J, Stovel K: Chains of affection: The structure of adolescent romantic and sexual networks.

Bansal S, Pourbohloul B, Meyers LA: A Comparative Analysis of Influenza Vaccination Programs.
PLoS Med 2006, 3:e387. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pourbohloul B, Meyers LA, Skowronski DM, Krajden M, Patrick DM, Brunham RC: Modeling control strategies of respiratory pathogens.
Emerg Infect Dis 2005, 11:12491256. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dimitrov NB, Goll S, Hupert N, Pourbohloul B, Meyers LA: Optimizing tactics for use of the U.S. antiviral strategic national stockpile for pandemic influenza.
PLoS One 2011, 6:e16094. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Volz E, Frost SDW, Rothenberg R, Meyers LA: Epidemiological bridging by injection drug use drives an early HIV epidemic.
Epidemics 2010, 2:155164. PubMed Abstract  Publisher Full Text

Abbey H: An examination of the ReedFrost theory of epidemics.
Hum Biol 1952, 24:201233. PubMed Abstract

Ferrari MJ, Bansal S, Meyers LA, Bjornstad ON: Network frailty and the geometry of herd immunity.
Proc R Soc B Biol Sci 2006, 273:27432748. Publisher Full Text

Leath PL: Cluster size and boundary distribution near percolation threshold.
Phys Rev B 1976, 14:50465055. Publisher Full Text

Hagberg AA, Schult DA, Swart PJ: Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena, CA USA; 2008:1115.

Csardi G, Nepusz T: The igraph software package for complex network research.

Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M: statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data.
J Stat Softw 2008, 24:111. PubMed Abstract  PubMed Central Full Text

Livnat Y, Gesteland P, Benuzillo J, Pettey W, Bolton D, Drews F, Kramer H, Samore M: Epinome  a novel workbench for epidemic investigation and analysis of search strategies in public health practice.
AMIA Annu Symp Proc 2010, 2010:647651. PubMed Abstract  PubMed Central Full Text

Estimation and Projection Package (EPP) http://www.unaids.org/en/dataanalysis/tools/estimationandprojectionpackageepp/ webcite

Chao DL, Halloran ME, Obenchain VJ, Longini IM: FluTE, a Publicly Available Stochastic Influenza Epidemic Simulation Model.
PLoS Comput Biol 2010, 6:e1000656. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

CDC H1N1 Flu  Preparedness Tools for Professionals http://www.cdc.gov/h1n1flu/tools/ webcite

Del Valle S, Stroud P, Smith J, Mniszewski S, Riese J, Sydoriak S, Kubicek D: EpiSimS: Epidemic Simulation System.

Ford DA, Kaufman JH, Eiron I: An extensible spatial and temporal epidemiological modelling system.

Bansal S, Grenfell BT, Meyers LA: When individual behaviour matters: homogeneous and network models in epidemiology.
J R Soc Interface 2007, 4:879891. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Blanchette J: C++ GUI programming with Qt 4. 2nd edition. Prentice Hall in association with Trolltech Press, Extensively rev. and expanded. Upper Saddle River NJ; 2008.

Molloy M, Reed B: A critical point for random graphs with a given degree sequence.
Random Struct Algorithm 1995, 6:161179. Publisher Full Text

Watts DJ, Strogatz SH: Collective dynamics of “smallworld” networks.
Nature 1998, 393:440442. PubMed Abstract  Publisher Full Text

On the uniform generation of random graphs with prescribed degree sequences http://arxiv.org/abs/condmat/0312028v2 webcite

Matsumoto M, Nishimura T: Mersenne Twister: A 623dimensionally equidistributed uniform pseudorandom number generator.
ACM Trans Model Comput Simulat 1998, 8:330. Publisher Full Text

EpiFire Chainbinomial Simulator https://raw.github.com/tjhladish/EpiFire/master/src/ChainBinomial_Sim.h webcite

EpiFire Percolation Simulator https://raw.github.com/tjhladish/EpiFire/master/src/ChainBinomial_Sim.h webcite

Burden RL, Faires JD: Numerical analysis. Prindle, Weber & Schmidt, Boston, Mass; 1985.

Newman MEJ, Watts DJ, Strogatz SH: Random graph models of social networks.
Proc Natl Acad Sci U S A 2002, 99 Suppl 1:25662572. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gillespie DT: Exact stochastic simulation of coupled chemical reactions.
J Phys Chem 1977, 81:23402361. Publisher Full Text

Keeling MJ, Rohani P: Modeling infectious diseases in humans and animals. Princeton University Press, Princeton; 2008.

Fruchterman TMJ, Reingold EM: Graph drawing by forcedirected placement.
Softw Pract Exper 1991, 21:11291164. Publisher Full Text

Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO: Global reconstruction of the human metabolic network based on genomic and bibliomic data.
Proc Natl Acad Sci 2007, 104:17771782. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Nieminen M: Migration of moth species in a network of small islands.
Oecologia 1996, 108:643651. Publisher Full Text

SISMID 2011: Home http://depts.washington.edu/sismid/ webcite

Volz E: SIR dynamics in random networks with heterogeneous connectivity.
J Math Biol 2007, 56:293310. PubMed Abstract  Publisher Full Text

Miller JC: A note on a paper by Erik Volz: SIR dynamics in random networks.
J Math Biol 2010, 62:349358. PubMed Abstract  Publisher Full Text

EpiFire SEIR Percolation Simulator https://raw.github.com/tjhladish/EpiFire/master/research/SEIR_Percolation_Sim.h webcite