Abstract
Background
In recent years largescale computational models for the realistic simulation of epidemic outbreaks have been used with increased frequency. Methodologies adapt to the scale of interest and range from very detailed agentbased models to spatiallystructured metapopulation models. One major issue thus concerns to what extent the geotemporal spreading pattern found by different modeling approaches may differ and depend on the different approximations and assumptions used.
Methods
We provide for the first time a sidebyside comparison of the results obtained with a stochastic agentbased model and a structured metapopulation stochastic model for the progression of a baseline pandemic event in Italy, a large and geographically heterogeneous European country. The agentbased model is based on the explicit representation of the Italian population through highly detailed data on the sociodemographic structure. The metapopulation simulations use the GLobal Epidemic and Mobility (GLEaM) model, based on highresolution census data worldwide, and integrating airline travel flow data with shortrange human mobility patterns at the global scale. The model also considers age structure data for Italy. GLEaM and the agentbased models are synchronized in their initial conditions by using the same disease parameterization, and by defining the same importation of infected cases from international travels.
Results
The results obtained show that both models provide epidemic patterns that are in very good agreement at the granularity levels accessible by both approaches, with differences in peak timing on the order of a few days. The relative difference of the epidemic size depends on the basic reproductive ratio, R_{0}, and on the fact that the metapopulation model consistently yields a larger incidence than the agentbased model, as expected due to the differences in the structure in the intrapopulation contact pattern of the approaches. The age breakdown analysis shows that similar attack rates are obtained for the younger age classes.
Conclusions
The good agreement between the two modeling approaches is very important for defining the tradeoff between data availability and the information provided by the models. The results we present define the possibility of hybrid models combining the agentbased and the metapopulation approaches according to the available data and computational resources.
Background
Computational approaches for the detailed modeling of epidemic spread in spatiallystructured environments make use of a wide array of simulation schemes [1,2]. In recent years, two major classes of methodologies emerged in the simulation of influenzalike illnesses (ILIs) and other emerging infectious diseases. The first one is the very accurate epidemic description with agentbased models, which keep track of each individual in the population in an extremely detailed way [314]. The second scheme relies on metapopulation structured models that consider in a detailed way the long range mobility scheme at the interpopulation level while using coarsegrained techniques at the intrapopulation level [1525]. Agentbased models provide a very rich data scenario, but the computational cost and, most importantly, the need for very detailed input data has limited its use to country level [611] or continental level [12] scenarios so far. On the opposite side, the structured metapopulation models are fairly scalable and can be conveniently used to provide worldwide scenarios and patterns with thousands of stochastic realizations [18,20,21,2325]. While on the one hand, the level of information that can be extracted in this latter case is less detailed than those of agentbased models, the spatial and temporal ranges and the number of realizations that can be computationally analyzed is much larger. Also, the amount of data to be integrated is less massive than in agentbased frameworks. From this perspective, it is clearly important to assess the level of agreement that the two different approaches can provide with respect to the quantities accessible, the respective data needed, and the computational costs associated with both approaches.
Comparing different models is often a hard task. While on one side one would like to assess the role of the differences inherent to each of the modeling frameworks, it is important to establish a common ground between the two frameworks in order to discount unwanted effects due to different parameterization (see for example the discussion of the estimation of the reproductive number for the SARS epidemic obtained from a variety of models in Ref. [26]). An attempt in this direction was presented in Ref. [10] where three individualbased models with different assumptions and data  one at the description level of a city and two at the description level of a country  were compared through their predictions in the case of interventions against a new pandemic influenza strain. However, the comparison was constrained to each model's assumptions and to the available simulated scenarios, without explicitly defining a common set of parameters and approximations to be shared by all models. The low transmission scenario was compared in different models by using different values for the reproductive number, with the risk of not being able to discount the effect of this difference in the obtained results.
Here we provide for the first time a sidebyside comparison of the results obtained at the level of a single country by using stateoftheart structured metapopulation and agentbased models developed independently and employed in previous works to analyze pandemic events [8,9,11,12,18,24,25,27]. Both models have been used in realistic scenarios [14,27] and incorporating actual data in relation to the H1N1 pandemic [24,28]. However, comparing simulation results with real data would require a thorough discussion and analysis of the disease parameters, the identification of the initial conditions, the assessment of the reliability of reporting and notification systems that are the sources of the empirical data. This is not the object of this paper. Instead, we focus on the differences generated by the two modeling approaches.
For the sake of clarity we compare the two models in a clean synthetic experiment of a hypothetical pandemic event for which we assume the same parameterization with regards to the modeling aspects that the models share, such as disease progression and initial conditions. The country used for the study is Italy, a large European country that provides the necessary geographic and population heterogeneity to assess the models' performance in the case of highlystructured populations. The two approaches access different granularity levels and we use as a comparison the finer spatial resolution accessible by both models. This allows us to analyze 39 major subpopulations and project data at the administrative level of municipality.
We find that both models, despite the difference in the data integration and model structure, provide epidemic profiles with spatiotemporal patterns in very good agreement. The epidemic size profile shows an expected overall mismatch of 510% depending on the reproductive rate, which is induced by the homogeneous assumption of the metapopulation strategy. Breaking down data at the level of agestructured compartments shows that both models provide very similar results with the exception of the elderly population (60 + age bracket), which show larger epidemic sizes in the metapopulation approach. The good agreement of the two approaches reinforces the message that computational approaches are stable with respect to different data integration strategies and modeling assumption. On the other hand, the agentbased model approach may access information not available to the coarser metapopulation approach, and relevant for individually based or targeted intervention measures. This is at the price of a higher computational cost and the availability of fine resolution data, whereas the metapopulation approach is less dependent on detailed data and is computationally cheaper. The presented results hint to the possibility of combining the two methodologies in order to devise multiscale approaches that use the data parsimony of the metapopulation approaches at the global level and the high resolution of the agentbased model in specific locations of interest where detailed data are available.
Methods
The agentbased modeling scheme
The considered agentbased model is a stochastic, spatiallyexplicit, discretetime, simulation model where the agents represent human individuals. The infection can spread among individuals through contacts with household members, school and workplace colleagues, and by random contacts with the general population [5,6]. One of the key features of the model is the characterization of the network of contacts among individuals based on a realistic model of the sociodemographic structure of the Italian population [8,9].
Population data for Italy — 56,995,744 individuals — is obtained from the census of 2001 [29] (382,534 census sections). According to the administrative borders of the country under study, the population is hierarchically grouped by municipalities (8,101), provinces (103) and regions (20), which also provide the spatial structure of the model (see Figure 1 and the Additional 1 File for details). Census data on age structure and frequencies of household type and size are jointly used with specific survey data on Italian households [30] to assign age and to colocate individuals in households. For each municipality, an appropriate number of households (and individuals) is generated to match the actual resident population.
Figure 1. Agentbased model and GLEaM. Top: The agentbased model is a stochastic and spatiallyexplicit simulation model where the agents represent individuals. The basic spatial structures considered in the model are the municipalities. The force of infection in the general population is assumed to decrease with the geographic distance among municipalities. The dependence on the distance is modeled by a gravity model as derived by the analysis of data on travel to school or work (grouped by all hierarchical administrative levels, from the national level down to the municipality level). The inset shows the explicit representation of individuals in the model enabling the simulations of the most important contacts for diseases transmission, i.e. household, school, and workplace contacts. The spatial spread of the epidemic is determined by i) transmission in the general population at the national scale and ii) transmission in schools and workplaces at a more local scale. Bottom: GLEaM, GLobal Epidemic and Mobility model. The world surface is represented in a gridlike partition where each cell — corresponding to a population value — is assigned to the closest airport. Geographic census areas emerge that constitute the subpopulations of the metapopulation model. The demographic layer is coupled with two mobility layers, the shortrange commuting layer and the longrange air travel layer.
Additional file 1. Supplementary information comparing largescale computational approaches to epidemic modeling: Agentbased versus structured metapopulation models. A single pdf file 22 pages, the figures are embedded in the pdf.
Format: PDF Size: 16.2MB Download file
This file can be viewed with: Adobe Acrobat Reader
Demographic, school, and industry census data from 2001 [31,32] are used for assigning an employment category (student, worker, or unemployed/retired) to individuals on an age basis. The legal working age in Italy is 15. Data on school attendance is available for individuals aged ≤ 14 years for any oneyear age class. For individuals aged ≥ 15 years, data on school attendance and employment rate is available for any oneyear age class. An employment category is assigned to any individual by sampling from the agedependent distribution of the frequencies of employment as obtained from the analysis of the data described above. In the model we first assign a size to schools and workplaces on the territory (schools and workplaces are spatiallydistributed proportionally to the population). Then we locate students and workers in the different places in such a way that the probability density function of travel distances complies with available commuting data for Italy.
Data on the proportion of individuals with age ≥ 15 working or attending school in the same municipality of residence is available for each municipality, together with the number of individuals traveling either to a municipality of the same province they live in, outside the province but within the same region, and outside the region. For determining the probability of commuting from municipality to municipality we use a general gravity model used in transportation theory [33,34] of the form
where N_{i }and N_{j }are the number of individuals living in municipality i and j respectively, d_{ij }is the distance between the two municipalities, θ is a proportionality constant, τ_{f }= 0.28 and τ_{t }= 0.66 tune the dependence of dispersal on donor and recipient sizes, and ρ = 2.95 tunes the dependence on the distance. Here we assume a power law functional form for the distance dependence, as in [35], although other functional forms — such as an exponential decay — can be considered [25,33,34].
The epidemic transmission model assumes that the infection can be transmitted within households, schools, workplaces, and by random contacts in the general population. Any susceptible individual i at any time t of the simulation has a probability
of being infected, where Δt is the time step of the simulation and λ_{i }is the instantaneous risk of infection. The latter is the sum of the risks coming from the three sources of infection: (1) contacts with infectious members of the household, (2) contacts with infectious individuals working in the same workplace or attending the same school, and (3) random contacts with infectious individuals in the population. While we assume homogeneous mixing in households, schools and workplaces, random contacts in the general population are assumed to depend explicitly on distance. Specifically, the contribution to the force of infection determined by an infectious individual k is weighted by the following kernel
a decreasing function of the geographical distance d_{ik}. Parameters a and b were optimized by employing Eq. (3) for generating a synthetic population of commuters such that the resulting probability density function of travel distances matches that obtained by using the gravity model of Eq. (1). The estimated parameters are a = 3.8 km and b = 2.32. As in [5,8,9], the model is parameterized so that 33% of transmission occurs in households, 33% in schools and workplaces and 33% in the general community. The epidemic transmission dynamics is based on an ILI compartmentalization as described in the subsection Models calibration (full details on the detailed formulation of the model are provided in the Additional File 1).
Metapopulation modeling scheme
The Global Epidemic and Mobility (GLEaM) model is based on a metapopulation approach [1521] in which the world is divided into geographical regions defining a subpopulation network where connections among subpopulations represent the individual fluxes due to the transportation and mobility infrastructure [24,25]. Infection spread occurs inside each urban area and is described by compartmental schemes in which the discrete stochastic dynamics of the individuals among different compartments depends on the specific etiology of the disease and the containment interventions considered. GLEaM integrates a highly detailed population database worldwide with the air transportation infrastructure and shortrange mobility patterns [24,25]. Air travel mobility is obtained from the International Air Transport Association (IATA [36]) database that contains the list of worldwide airport pairs connected by direct flights and the number of available seats on any given connection [37]. The resulting worldwide airtransportation network is a weighted graph composed of 3,362 vertices denoting airports in 220 different countries and 16,846 weighted edges whose weight, ω_{jl}, represents the number of passengers flying between airports j and l, accounting for 99% of worldwide traffic. Each airport is associated to a georeferenced census area as obtained from a Voronoi tessellation on the population database [25]. GLEaM is based on the highresolution population database of the "Gridded Population of the World" project of SEDAC [38] (Columbia University), which estimates the population with a granularity given by a lattice of cells covering the whole planet at a resolution of 15 × 15 minutes of arc. We define the geographical census areas centered on IATA airports by assigning each cell to the closest airport as long as the distance between the center of the cell and the airport is less than 200 km. This is the characteristic length scale of the cell/airport distribution as well as the scale for the intensity of the ground commuting flows [24]. Such a procedure divides Italy into 39 distinct areas (subpopulations) that define the metapopulation structure we use. A schematic illustration of the model and of the layers considered is reported in Figure 1.
The georeferenced nature of the subpopulations allows for the integration of shortscale mobility between adjacent subpopulations into the model. GLEaM considers commuting and mobility patterns of various means of land transportation (bus, cars, train, etc.). National commuting data available at administrative levels are then mapped into the geographic census areas obtained from the tessellation procedure [25,33,34]. In the present study we use real mobility data for Italian municipalities as provided by the Italian National Statistics and Census Bureau (ISTAT) to obtain the commuting flows among the census areas defining the Italian subpopulations.
GLEaM is fully stochastic and can simulate the longrange mobility of individuals from one subpopulation to another subpopulation by means of the airline transportation network in a manner similar to the models presented in Refs. [1525]. In particular, in each city j the number of passengers traveling on each connection j → l at time t defines a set of stochastic variables that follow a multinomial distribution [22]. The calculation can be extended to include transit traffic as well, e.g. up to one connection flight [39]. Shortrange, multimodal transportation between subpopulations is modeled with a timescale separation approach that defines an effective force of infection in connected subpopulations based on the real commuting flow data between adjacent subpopulations integrated in the model [25,40,41]. The discrete nature of individuals is also preserved in compartmental transitions and in shortrange mobility processes. The transmission model within each geographical census area follows an ILI compartmentalization common to the agentbased model, as shown in the following section. The contagion process (i.e. the generation of new latent individuals from the contact of infectious and susceptible individuals) and the spontaneous transitions (e.g. from latent to infectious or from infectious to recovered) are modeled with multinomial distributions. The actual expressions used for the force of infection contain several terms, as they have to discount nontraveling infectious individuals and second order terms generated by the interactions of individuals from neighboring subpopulations. Here we also introduce the age structure of the population by defining a contact matrix specifying the force of infection across different age brackets. We adopt the contact matrix formalism and the age classes defined by Wallinga and collaborators [42]. In this case the basic reproduction number R_{0 }is determined by the largest eigenvalue of the modified next generation matrix. The full derivation of the epidemic model and its implementation is reported in the Additional File 1.
Models calibration
In order to study the effect of the assumptions related to the different approaches exclusively, we align the set of parameters for the disease transmission model and the initial conditions in both models (see Table 1). The agentbased and metapopulation models are stochastic, spatially structured, and based on discrete time simulations. Though the social and mobility structure changes across the models, both GLEaM and the agentbased model are based on the same transmission dynamics. The models adopt a compartmentalization for an ILI defined in terms of susceptible (S), latent (L), asymptomatic infectious (I^{a}), symptomatic infectious (I), and permanently recovered/removed (R) (see Figure 2).
Table 1. Model parameters
Figure 2. Disease compartmental structure. Diagram flow of the infection transmission structure adopted by both models. The transition from the susceptible class to the latent class is induced by the interaction between the susceptible individuals and the infectious individuals (see text).
A susceptible individual in contact with a symptomatic or asymptomatic infectious person can contract the infection and enter the latent compartment where he is infected but not yet infectious. The transmission occurs at different rates that take into account the reduced infectiousness of asymptomatic individuals and additional effects, e.g. those induced by absenteeism that are considered in the agentbased model (a full discussion is reported in the Additional File 1). At the end of the latency period, each latent individual becomes symptomatic with probability 1  p_{a }or becomes asymptomatic with probability p_{a}. All infectious individuals recover permanently (i.e. become immunized from further infection) and enter the recovered compartment at rate μ. We fix the average latency period ε^{1 }= 2 days and the average infectious period μ^{1 }= 3 days [4,18,43] equal in the two models. Given that infection has occurred, both GLEaM and the agentbased model assume that individuals become asymptomatic with probability p_{a }= 0.33 [4,18,43], with a relative infectiousness equal to r_{β }= 0.5. In addition, both models assume that clinical disease affects individual behavior. GLEaM assumes that symptomatic individuals avoid traveling with probability 1 p_{t }= 0.5 [18,43], whereas the agentbased considers the reduction of school and work attendance [5,6,8] (see the Additional File 1 for details). The spreading rate of the disease is governed by the basic reproduction number (R_{0}) which is defined as the average number of infected cases generated by the introduction of a typical infectious person into a fully susceptible population [44]. For the proposed compartmentalization, its value can be obtained for GLEaM by evaluating the largest eigenvalue of the Jacobian or next generation matrix of the infection dynamics in a diseasefree equilibrium [45], yielding R_{0 }= βμ^{1}(1  p_{a }+ r_{β}p_{a}) if the age structure is not considered. In the case of the agentbased model, it is computed as
where r is the intrinsic growth rate of the simulated epidemic.
The two models are calibrated to the same value of the reproductive number R_{0}. In addition, GLEaM and the agentbased model are also dynamically calibrated in that they share exactly the same initial/boundary conditions. GLEaM is defined at the worldwide scale and allows the study of an emerging epidemic under a variety of geographical and temporal initial conditions based on any geographical census area of the model at any time of the year. The agentbased model is defined at the level of the country, and, as in other individualbased stochastic simulations describing the scale of a given region [3,6,7], it is based on the importation of cases from abroad. The case importation is generally modeled through a global unstructured SEIR compartmental model that simulates the epidemic worldwide and feeds the country of interest through cases arriving at the international airports proportional to the traffic of the airports.
Several procedures can be modeled, including both those with stationary initial conditions in which the simulations let the epidemic progress after the first seeding has occurred with no additional importation of cases [9], and those with dynamic initial conditions in which the importation of cases is not stopped by the beginning of the epidemic in the country under study [6,8]. In order to align GLEaM and the agentbased model under the same initial conditions, we assume dynamic importation of cases in the agentbased model as provided by GLEaM. We choose Hanoi, Vietnam, as the seed of the epidemic for GLEaM and study the geotemporal spreading pattern of the epidemic at the worldwide scale. The number of infected individuals imported into Italy at each international airport is tracked in time in each stochastic realization and provides the set of the dynamic initial conditions for the agentbased model. This approach allows us to study the evolution of the epidemic in Italy with the two models sidebyside, discounting the effects that relate to different seeding at the boundary of the country.
Here we study a pandemic baseline scenario, assuming no seasonality as in Refs. [68], taking on three values for the reproductive number, R_{0 }= 1.5, 1.9, and 2.3, in the range of expected values for a newlyemerging influenza pandemic as based on estimates for previous pandemics [15,46]. We do not implement intervention strategies because our aim is to explore the effect of two different modeling frameworks in shaping the epidemics, assessing analogies and differences induced by each model's assumptions.
All results in the following section are based on 50 stochastic realizations per model, each realization feeding the two models with equal dynamic initial conditions. Results are reported at different resolution scales, including the country level, the geographical census areas around major transportation hubs, and the smallest scale of municipalities. Italy includes 8,101 municipalities that are grouped in 39 GLEaM geographical census areas.
Results
Country scale
Figure 3 shows the timeline of the incidence profile and of the epidemic size obtained with GLEaM and with the agentbased model. Time is expressed in days, and the first importation of infectious individuals into Italy is used to synchronize the two models. Thanks to the initial alignment, Figure 3 shows the epidemic unfolding sidebyside in the same time window explored by the two models, so that it is possible to assess the timing and synchronization of the simulated epidemics. The incidence profiles show that on average the two temporal patterns are in very good agreement, despite the very different data integration and assumptions of the two models. The two peaks are just a few days apart from each other, with GLEaM on average reaching the peak of the epidemic slightly later than the agentbased model. The value of the epidemic incidence at the peak in the simulations obtained with the agentbased model is lower than in the simulations with the GLEaM model. This difference has to be expected since we are comparing an individualbased approach with a spatiallystructured model based on an assumption of homogeneous transmission rates for the interactions of people in the subpopulations. Indeed, as observed in earlier works, models with heterogeneous transmission rates across population groups present different attack rates  usually lower  than those with homogenous mixing, even for the same overall value of R_{0 }(See for instance the discussion in [47,48] and references therein). Changes in attack rates and even epidemic thresholds are also observed when the full interaction pattern of individuals is considered [4951]. While the GLEaM model just considers the spatial structure and the age structure, the agentbased model used here is highly structured and considers households, schools, etc. The two models therefore are expected to present different attack rates. The difference in the peak amplitudes decreases for increasing values of the reproductive number and the same effect is also evident from the curves of the epidemic size. At the end of the epidemic outbreak, the average size predicted by GLEaM ranges from 36% for R_{0 }= 1.5 to 56% for R_{0 }= 2.3, as compared to the one observed in the agentbased model which ranges from 26% for R_{0 }= 1.5 to 49% for R_{0 }= 2.3, with an absolute difference of about 10% for R_{0 }= 1.5 and 7% for R_{0 }= 2.3. Fluctuations are comparable in the two models, as shown by the shaded areas around the average values, representing the 95% reference ranges obtained from the stochastic runs.
Figure 3. Comparison of the epidemic incidence and size. Incidence profiles and epidemic size for GLEaM and the agentbased model at the global level. Time is expressed in days since the first importation of infected individuals in Italy. Results for three values of the reproductive number are shown from left to right: R_{0 }= 1.5, R_{0 }= 1.9, R_{0 }= 2.3. Average profiles (lines) and 95% CI (shaded areas) are shown.
The subpopulation structure of GLEaM and its coupling with mobility processes preserves accurate timing in different geographical areas. However, when attack rate is considered we still see differences, as the household and workplace structure are important in differentiating the impact on different age brackets. GLEaM includes a spatial substructure that subdivides the global populations into subpopulations around major transportation hubs. Inside each census area the subpopulation is divided into age classes. The frequency of interaction among individuals in different age classes is governed by a specific matrix such that within each age class the individuals are all considered equivalent and a homogenous assumption is used for the evaluation of the force of infection. The agentbased model is more refined in the definition of the social/spatial/age structure in the population, being defined at the level of the single individual. In this case each individual is tagged with the appropriate social bracket by assigning the household structure, workplace size, etc.
As we will see in the next sections, the main differences in the two models are observed for the 60+ age class. Indeed, this is the age class with the most marked differences in household structure and workplace habits; such differences cannot be taken into consideration in the metapopulation level. It is however difficult to state which of the two predictions is the most accurate. On one hand the high level of realism of the agentbased model should make the prediction reliable. On the other hand this high realism is not free of modeling assumptions, as for instance in the definitions of Eqs. (1) and (3). The correct value should be in between the prediction of the models, as supported by the fact that the difference between the models decreases as R_{0 }increases, with the models converging to the same value for the attack rate. For large R_{0 }in fact, the local epidemics  in census areas for GLEaM, and in households/workplaces in the agentbased model  become more widespread across all the layers of the population and thus the differences in the population structure are less relevant. In the Additional File 1 we also report the results for a simple single SLIR population model aligned with the agentbased and metapopulation models. As expected such a simple model is not able to recover the variability of the incidence profile and the final attack rate of the epidemic.
The peak delay between the two models is defined as the absolute difference between the activity peak time T_{GLEaM }and T_{AB }of the metapopulation and agentbased models, respectively. The difference (T_{GLEaM } T _{AB}) is expressed in days and calculated for each pair of stochastic realizations. Figure 4 shows the probability distributions of this quantity, calculated for the three values of R_{0 }explored. We consider both negative and positive differences corresponding to one model anticipating the other or vice versa. GLEaM more likely reaches the peak later than the agentbased model, with a most probable delay of about 24 days, explaining the very good agreement in the timing observed in Figure 3. Fluctuations around these values are reduced for increasing values of R_{0}, being 3 to 8 days for R_{0 }= 1.5 and 2 to 6 days for R_{0 }= 2.3, showing how higher transmission scenarios would lead to more synchronized epidemics in the two models.
Figure 4. Activity peaks difference in the two models. Histogram of the activity peak difference (T_{GLEaM } T_{AB}) (measured in days) between GLEaM and the agentbased model at the global level. The histogram is obtained by comparing each pair of stochastic realizations in the two models and considering negative and positive differences when the GLEaM activity peak occurs before or after the agentbased model, respectively. Results for three values of the reproductive number are shown from left to right: R_{0 }= 1.5, R_{0 }= 1.9, R_{0 }= 2.3.
Census area scale
Given the high spatial definition of both models, it is possible to further investigate differences in the observed epidemic patterns by looking at the results obtained in different spatial regions of Italy. In particular, we focus on the geographical census areas defined in GLEaM and aggregate the simulation results of the agentbased model from the scale of municipalities to the scale of the geographical census areas. Figure 5A reports the average incidence profiles of a selected number of geographical census areas in Italy distributed from North to South, and the large islands. Results are shown for R_{0 }= 1.9, whereas additional results for the other two values explored are reported in the Additional File 1. The plots show heterogeneous variations in the comparison of the profiles, with geographic census areas where the two models are synchronized and others in which the agentbased profile is shifted before or after the GLEaM model by a few days. Also, the differences in the peak amplitude vary across the country. We thus explored possible relations between the observed differences in the timing and size of the epidemic and some features at this resolution scale that are common to both models. In particular, we considered: (i) the NorthSouth position of the geographic census area as indicated by the latitude of its centroid, around which the area was defined in GLEaM through the tessellation procedure; (ii) the population size of the geographic census area; and (iii) the airline traffic of the geographic census area, defined as the number of passengers per day traveling through its airports.
Figure 5. Epidemic profiles and geography. Geographic variation of the epidemic profiles for GLEaM and the agentbased model at the level of the major urban areas in Italy: a) profiles for a selected number of Italian subpopulations distributed from North to South and in the Islands. Time is expressed in days since the first importation of infected individuals in Italy. Average profiles for the scenario with R_{0 }= 1.9 are shown; b) difference of the epidemic size as a fraction the population size (top row) and peak shift measured in days (bottom row) between GLEaM and the agentbased model at the level of GLEaM geographical census areas as functions of: the latitude of the geographical census area (left); its population size (center); and the traffic of the airport associated to the geographical census area (right). Results for R_{0 }= 1.9 are shown.
Results in Figure 5B show that the differences in the epidemic size tend to be stable from North to South, and to decrease with increasing population size and increased airport traffic. This can be explained by the fact that larger numbers in population and traffic (on average large population sizes are associated to large traffic airports [22]) smooth out differences and the effect of fluctuations, which are instead more pronounced in populations of small size. If we look at the timing, we observe that there is a pronounced anticipation of the GLEaM model with respect to the agentbased model in the Southern regions (especially in the Islands), reaching a good synchronization in the Center and a stationary small delay in the North of the country. Because of the stationary behaviors in the relations between the peak shift and the population size or airport traffic of the geographical census areas, the results observed with respect to latitude appear to indicate a genuine difference between the two frameworks. Both models consider commuting patterns  GLEaM integrates the commuting network among geographic census areas obtained from the Italian origindestination commuting data, and the agentbased model integrates a synthetic commuting network among municipalities reproducing the statistics of commuters throughout the country from coarsegrained information on destination data. Though built on different levels of detail, both commuting networks are expected to reproduce the geographical fluctuations observed in the mobility of the Italian population, with a percentage of commuters increasing from 15% in Southern Italy to 60% in Northern Italy. Long distance travel seems instead to be responsible for the observed behavior in the peak shift vs. the latitude. The distance kernel for random contacts in the population considered in the agentbased model might be unable to reproduce some of the complex properties that are found in the air travel flows with NorthSouth heterogeneities. In this respect, the introduction of longdistance travel in the agentbased model [9] could contribute to smooth out differences.
Municipality scale
By increasing the spatial resolution even further, it is possible to monitor the geotemporal spread of the disease at the level of the 8,101 municipalities in the country. The results by GLEaM at the level of the geographic census areas are mapped into the administrative boundaries of the municipalities to be comparable with the simulation results produced by the agentbased model. The observed epidemic pattern is shown in Figure 6 for three different snapshots of the simulations in terms of average values of the new number of clinical cases per municipality. The visualization confirms the above results, showing a very good agreement of the geographic distribution of cases at the finest resolution scale available.
Figure 6. Geotemporal spreading pattern of the epidemic. Comparison of the spatial epidemic evolution in GLEaM (top) and in the agentbased model (bottom) at three different snapshots of the simulation for R_{0 }= 1.9. From left to right snapshots show: 127 days, 148 days, and 176 days after the first importation of infected individuals in Italy. Maps reproduce the average number of cases at the resolution scale of the Italian municipalities.
Age class breakdown
The age structure of GLEaM comprises 6 classes of age, namely 05, 612, 1319, 2039, 4059, and 60 + years old. Results on the incidence by age as obtained by the agentbased model have been aggregated according to the age structure of GLEaM, which allows us to compare the simulations' results broken down by age classes. Figure 7 shows the epidemic size by age class as obtained by the two models for the three values of R_{0 }investigated. In all cases the agreement is higher in the younger age classes (05, 612, and 1319 years old), and deviations start to be more pronounced for the young adult, adult, and older age classes. However, as seen before when considering all age classes, deviations are reduced by the increasing values of R_{0}. The largest deviations observed are in the 60+ age class, with 28% against 16% of the average epidemic size obtained for R_{0 }= 1.5 with GLEaM and with the agentbased model, respectively; 40% against 27% for R_{0 }= 1.9; and 49% against 35% for R_{0 }= 2.3. This is indeed the age class with the most marked difference in household structure and workplace habits that cannot be taken into consideration in the metapopulation level, thus generating the largest discrepancy between the two models.
Figure 7. Cumulative cases in different age brackets. Comparison of epidemic size by age group between GLEaM and the agentbased model for three values of the reproductive number: R_{0 }= 1.5 (left), R_{0 }= 1.9 (center), and R_{0 }= 2.3 (right).
Discussion and conclusions
We studied a structured metapopulation model and an agentbased model to provide a sidebyside comparison of the modeling frameworks and assess the epidemic predictions that they can achieve. Starting from a shared parameterization of the disease progression and using identical initial conditions, we investigated and quantified similarities and differences in the results at different scales of resolution, and related those to the assumptions of the frameworks and to their integrated data. We found the two models to display a very good agreement in the timing of the epidemic, with a very limited variation in the time of the simulated epidemic activity peaks. In the metapopulation approach the fraction of the population affected by the epidemic is larger (by 5% to 10%) than in the agentbased approach. This difference is due to the assumption of homogeneity and thus the lack of detailed structure of contacts (besides the age structure) in the metapopulation approach with respect to the agentbased approach.
Our results highlight advantages and disadvantages of using the two approaches. On one side the detailed mobility networks considered in the metapopulation scheme provide an accurate description of the spreading pattern of the unfolding epidemic, identifying the major channels of transportation responsible for spreading the disease at the global level and quantifying the seeding events. On the other side, detailed estimations of the impact of the disease at a more local level are hampered by the lower level of detail contained in the metapopulation modeling scheme. The agentbased approach is extremely detailed but suffers from the difficulties in gathering high confidence datasets for most regions of the world. The good match between the two approaches in predicting the geotemporal spreading pattern of an epidemic demonstrates the feasibility of a hybrid approach that combines and integrates the two modeling schemes. Thanks to the heterogeneity of the transportation network, the spatiotemporal spread of an epidemic could be predicted at the global scale by employing a metapopulation approach. Taking advantage of the explicit representation of individuals in the model, the impact at a more local scale and the effects of individuallytargeted interventions in specific areas could be predicted by employing an agentbased approach.
Competing interests
AV is consulting and has a research agreement with Abbott for the modeling of H1N1. The other authors declare no competing interests.
Authors' contributions
All authors have contributed to conceive, design and carry out the study and draft the manuscript.
Acknowledgements
We are grateful to the International Air Transport Association for making the airline commercial flight database available. This work has been partially funded by the NIH R21DA024259 award, the Lilly Endowment grant 2008 1639000 and the DTRA10910039 award to AV; the ECICT contract no. 231807 (EPIWORK) to AV, VC, SM and MA; the ERC Ideas contract n.ERC2007Stg204863 (EPIFOR) to VC; the EC contract n. FET233847 (DYNANETS) to AV, VC, JJR; the ECHEALTH FLUMODCONT project to SM and MA.
References

Riley S: LargeScale SpatialTransmission Models of Infectious Disease.
Science 2007, 316:12981301. PubMed Abstract  Publisher Full Text

Coburn BJ, Bradley G, Wagner BG, Blower S: Modeling influenza epidemics and pandemics: insights into the future of swine flu (H1N1).
BMC Medicine 2009, 7:30. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Eubank S, Guclu H, Anil Kumar VS, Marathe MV, Srinivasan A, Toroczkai Z, Wang N: Modelling disease outbreaks in realistic urban social networks.
Nature 2004, 429:180184. PubMed Abstract  Publisher Full Text

Longini IM, Nizam A, Xu S, Ungchusak K, Hanshaoworakul W, Cummings D, Halloran ME: Containing pandemic influenza at the source.
Science 2005, 309:10831087. PubMed Abstract  Publisher Full Text

Ferguson NM, Cummings DAT, Cauchemez S, Fraser C, Riley S, Meeyai A, Iamsirithaworn S, Burke DS: Strategies for containing an emerging influenza pandemic in Southeast Asia.
Nature 2005, 437:209214. PubMed Abstract  Publisher Full Text

Ferguson NM, Cummings DAT, Fraser C, Cajka JC, Cooley PC, Burke DS: Strategies for mitigating an influenza pandemic.
Nature 2006, 442:448452.
2006
PubMed Abstract  Publisher Full Text 
Germann TC, Kadau K, Longini IM, Macken CA: Mitigation strategies for pandemic influenza in the United States.
Proc Natl Acad Sci USA 2006, 103:59355940. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ciofi degli Atti ML, Merler S, Rizzo C, Ajelli M, Massari M, Manfredi P, Furlanello C, Scalia Tomba G, Iannelli M: Mitigation Measures for Pandemic Influenza in Italy: An Individual Based Model Considering Different Scenarios.
PLoS ONE 2008, 3(3):e1790. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ajelli M, Merler S: The Impact of the Unstructured Contacts Component in Influenza Pandemic Modeling.
PLoS ONE 2008, 3(1):e1519. Publisher Full Text

Halloran ME, Ferguson NM, Eubank S, Longini IM, Cummings DAT, Lewis B, Xu S, Fraser C, Vullikanti A, Germann TC, Wagener D, Beckman R, Kadau K, Macken , Burke DS, Cooley P: Modeling targeted layered containment of an influenza pandemic in the United States.
Proc Natl Acad Sci USA 2008, 105:46394644. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Merler S, Ajelli M, Rizzo C: Ageprioritized use of antivirals during an influenza pandemic.
BMC Infectious Diseases 2009, 9:119. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Merler S, Ajelli M: The role of population heterogeneity and human mobility in the spread of pandemic influenza.
Proc Royal Soc B 2010, 77:557565. Publisher Full Text

Davey VJ, Glass RJ, Min HJ, Beyeler WE, Glass LM: Effective, robust design of community mitigation for pandemic influenza: a systematic examination of proposed US guidance.
PLoS ONE 2008, 3:e2606. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ajelli M, Merler S: An individualbased model of hepatitis A transmission.
Journal of Theoretical Biology 2009, 259:478488. PubMed Abstract  Publisher Full Text

Rvachev LA, Longini IM: A mathematical model for the global spread of influenza.
Mathematical Biosciences 1985, 75:322. Publisher Full Text

Grais RF, Hugh Ellis J, Glass GE: Assessing the impact of airline travel on the geographic spread of pandemic influenza.
Eur J Empidemiol 2003, 18:10651072. Publisher Full Text

Hufnagel L, Brockmann D, Geisel T: Forecast and control of epidemics in a globalized world.
Proc Natl Acad Sci USA 2004, 101:1512415129. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Colizza V, Barrat A, Barthelemy M, Valleron AJ, Vespignani A: Modeling the Worldwide spread of pandemic influenza: baseline case and containment interventions.
PloS Medicine 2007, 4:e13. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Flahault A, Valleron AJ: A method for assessing the global spread of HIV1 infection based on airtravel.

Cooper BS, Pitman RJ, Edmunds WJ, Gay NJ: Delaying the international spread of pandemic influenza.
PloS Medicine 2006, 3:e12. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Epstein JM, Goedecke DM, Yu F, Morris RJ, Wagener DK, Bobashev GV: Controlling Pandemic Flu: The Value of International Air Travel Restrictions.
PLoS ONE 2007, 2:e401. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Colizza V, Barrat A, Barthelemy M, Vespignani A: The role of the airline transportation network in the prediction and predictability of global epidemics.
Proc Natl Acad Sci USA 2006, 103:20152020. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Flahault A, Vergu E, Coudeville L, Grais R: Strategies for containing a global influenza pandemic.
Vaccine 2006, 24:67516755. PubMed Abstract  Publisher Full Text

Balcan D, Hu H, Goncalves B, Bajardi P, Poletto C, Ramasco JJ, Paolotti D, Perra N, Tizzoni M, Van den Broeck W, Colizza V, Vespignani A: Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility.
BMC Medicine 2009, 7:45. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A: Multiscale mobility networks and the large scale spreading of infectious diseases.
Proc Natl Acad Sci USA 2009, 106:2148421489. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bauch CT, LloydSmith JO, Coffee MP, Galvani AP: Dynamically modeling SARS and other newly emerging respiratory illnesses  past, present, future.
Epidemiol 2005, 16:791801. Publisher Full Text

Colizza V, Barrat A, Barthelemy M, Vespignani A: Predictability and epidemic pathways in global outbreaks of infectious diseases: the SARS case study.
BMC Medicine 2007, 5:34. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Ajelli M, Merler S, Pugliese A, Rizzo C: Model predictions and evaluation of possible control strategies for the 2009 A/H1N1v influenza pandemic in Italy.
Epidemiol Infect 2010, 14:112. Publisher Full Text

Italian Institute of Statistics: XIV Censimento generale della popolazione e delle abitazioni. [http://dawinci.istat.it/MD/] webcite
2001.
(in Italian)

Italian Institute of Statistics: Strutture familiari e opinioni su famiglia e figli. [http://www.istat.it/dati/catalogo/20060621_03] webcite
2003.
(in Italian)

Italian Institute of Statistics: VIII Censimento generale dell'industria e dei servizi. [http://dwcis.istat.it/cis/index.htm] webcite
2001.
(in Italian)

Italian Ministry of University and Research: La scuola in cifre. [http://statistica.miur.it/ustat/documenti/pub2005/index.asp] webcite
2005.
(in Italian)

Erlander S, Stewart NF: The gravity model in transportation analysis.

Ort'uzar J de D, Willumsen LG: Modelling Transport. John Wiley and Sons Chichester, UK; 2001.

Viboud C, Bjornstad O, Smith DL, Simonsen L, Miller MA, Grenfell BT: Synchrony,waves, and spatial hierarchies in the spread of influenza.
Science 2006, 312:447451. PubMed Abstract  Publisher Full Text

International Air Transport Association [http://www.iata.org] webcite

Barrat A, Barthelemy M, PastorSatorras R, Vespignani A: The architecture of complex weighted networks.
Proc Natl Acad Sci USA 2004, 101:37473752. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Center for International Earth Science Information Network (CIESIN), Columbia University; and Centro Internacional de Agricultura Tropical (CIAT): The Gridded Population of the World Version 3 (GPWv3): Population Grids. In Palisades, NY: Socioeconomic Data and Applications Center (SEDAC). Columbia University;

Colizza V, Barrat A, Barthelemy M, Vespignani A: The modeling of global epidemics: Stochastic dynamics and predictability.
Bull Math Bio 2006, 68:18931921. Publisher Full Text

Keeling MJ, Rohani P: Estimating spatial coupling in epidemiological systems: a mechanistic approach.
Ecology Letters 2002, 5:2029. Publisher Full Text

Sattenspiel L, Dietz K: A structured epidemic model incorporating geographic mobility among regions.
Math. Biosci 1995, 128:7191. PubMed Abstract  Publisher Full Text

Wallinga J, Teunis P, Kretzschmar M: Using data on social contacts to estimate agespecific transmission parameters for respiratoryspread infectious agents.
American Journal of Epidemiology 2006, 164:936944. PubMed Abstract  Publisher Full Text

Longini IM, Halloran ME, Nizam A, Yang Y: Containing pandemic influenza with antiviral agents.
American Journal of Epidemiology 2004, 159:623633. PubMed Abstract  Publisher Full Text

Anderson RM, May RM: Infectious Diseases in Humans. Oxford Univ. Press, Oxford; 1992.

Diekmann O, Heesterbeek JAP: Mathematical epidemiology of infectious diseases: Model building, analysis and interpretation. New York: John Wiley and Sons; 2000:303.

Mills CE, Robins JM, Lipsitch M: Transmissibility of 1918 pandemic influenza.
Nature 2004, 432:904906. PubMed Abstract  Publisher Full Text

Chao DL, Halloran ME, Obenchain VJ, Longini IM: FluTE a publicly available stochastic influenza epidemic simulation model.
PLoS Comput Biol 2010, 6:e1000656. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Watts DJ, Muhamad R, Medina DC, Dodds PS: Multiscale, resurgent epidemics in a hierarchical metapopulation model.
Proc Natl Acad Sci USA 2005, 102:1115711162. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Stroud PD, Sydoriak SJ, Riese JM, Smith JP, Mniszewski SM, Romero PR: Semiempirical powerlaw scaling of new infection rate to model epidemic dynamics with inhomogenous mixing.
Mathematical Biosciences 2006, 203:301318. PubMed Abstract  Publisher Full Text

PastorSatorras R, Vespignani A: Epidemic spreading In scalefree networks.
Phys Rev Lett 2001, 86:32003203. PubMed Abstract  Publisher Full Text

Lloyd AL, May RM: How viruses spread among computers and people.
Science 2001, 292:13161317. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: