Complex Networks Lagrange Laboratory (CNLL), Institute for Scientific Interchange (ISI) Foundation, Turin, Italy

Unité Mixte de Recherche du CNRS UMR 8627, Bâtiment 210, Univ Paris-Sud, F-91405 Orsay, France

CEA-DIF Centre d'Etudes de Bruyères-Le-Châtel, BP12, F-91680, France

School of Informatics and Center for Biocomplexity, Indiana University, Bloomington, IN 47401, USA

Institute for Scientific Interchange (ISI) Foundation, Turin, Italy

Abstract

Background

The global spread of the severe acute respiratory syndrome (SARS) epidemic has clearly shown the importance of considering the long-range transportation networks in the understanding of emerging diseases outbreaks. The introduction of extensive transportation data sets is therefore an important step in order to develop epidemic models endowed with realism.

Methods

We develop a general stochastic meta-population model that incorporates actual travel and census data among 3 100 urban areas in 220 countries. The model allows probabilistic predictions on the likelihood of country outbreaks and their magnitude. The level of predictability offered by the model can be quantitatively analyzed and related to the appearance of robust epidemic pathways that represent the most probable routes for the spread of the disease.

Results

In order to assess the predictive power of the model, the case study of the global spread of SARS is considered. The disease parameter values and initial conditions used in the model are evaluated from empirical data for Hong Kong. The outbreak likelihood for specific countries is evaluated along with the emerging epidemic pathways. Simulation results are in agreement with the empirical data of the SARS worldwide epidemic.

Conclusion

The presented computational approach shows that the integration of long-range mobility and demographic data provides epidemic models with a predictive power that can be consistently tested and theoretically motivated. This computational strategy can be therefore considered as a general tool in the analysis and forecast of the global spreading of emerging diseases and in the definition of containment policies aimed at reducing the effects of potentially catastrophic outbreaks.

Background

The outbreak of severe acute respiratory syndrome (SARS) in 2002–2003 represented a serious public health threat to the international community. Its rapid spread to regions far away from the initial outbreak created great concern for the potential ability of the virus to affect a large number of countries and required a coordinated effort aimed at its containment

In this article, we present a stochastic meta-population epidemic model, based on the extension of the deterministic modeling approach to global epidemic diffusion

Methods

We adopt a global stochastic meta-population model that considers a set of coupled epidemic transmission models. The approach is in the same spirit as the deterministic models used for the global spread of infectious diseases and their successive stochastic generalizations _{j }of a city _{j }= _{j}(_{j}(_{j}(_{j}(_{j}(_{j}(_{j}(_{j}, where _{j}(_{j }and number of trials _{j}(_{j }→ _{j }with rate _{j}(

Changing from a basic SIR model to a refined compartmentalization, additional processes ought to be taken into account, as the possibility of having more than one compartment able to transmit the infection, due e.g. to the non-perfect isolation of quarantined individuals. In the case of SARS, which will be addressed in the following section, the infection dynamics includes the specific characteristics of the disease under study, such as latency, hospitalization, patient isolation, and fatality rate

Additional figures and materials. The Additional file

Click here for file

Flow diagram of the transmission model

**Flow diagram of the transmission model**. The population of each city is classified into seven different compartments, namely susceptible (_{R}) or die (_{D}), dead (_{β }for hospitalized patients, with _{β }= 20% as estimated for the early stage of the epidemic in Hong Kong [9]. The infectiousness of patients in the compartments _{R }and _{D }are assumed to be equal (although this assumption can easily be changed in the model). Susceptible individuals exposed to SARS enter the latent class. Latents represent infected who are not yet contagious and are assumed to be asymptomatic, as suggested by results based on epidemiologic, clinical and diagnostic data in Canada [40]. They become infectious after an average time ^{-1 }(mean latency period). The individual is classified as infectious during an average time equal to ^{-1 }from the onset of clinical symptoms to his admission to the hospital where he eventually dies or recovers. Patients admitted to the hospital are not allowed to travel. The average periods spent in the hospital from admission to death or recovery are equal to _{D}^{-1 }and _{R}^{-1}, respectively. The average death rate is denoted by

Each compartmental model in a given urban area is then coupled to the compartmental models of other urban areas via a travel stochastic operator that identifies the number of individuals in each compartment traveling from the urban area _{i }potential travellers has a probability _{ij }= _{ij}_{i }to go from _{ij }is the traffic, according to the data, on a given connection in the considered time scale and _{i }is the urban area population. In each city

The defined model considers stochastic fluctuations both in the individual compartmental transitions and in the traveling events. This implies that in principle each model realization, even with the same initial conditions, may be different from all the others. In this context, the comparison of a single realization of the model with the real evolution of the disease may be very misleading. Similarly, the mere comparison of the number of cases obtained in each country averaged over several realizations with the actual number of cases occurred is a poor indicator of the reliability of the achieved prediction. Indeed in many cases the average would include a large number of occurrences with no outbreaks in a variety of countries. It is therefore crucial to distinguish in each country (or to a higher degree of resolution, in each urban area) the non-outbreak from the outbreak realizations and evaluate the number of cases conditionally to the occurrence of the latter events. For this reason, we define in the following a set of indicators and analysis tools that can be used to provide scenarios forecast and real world data comparison.

Outbreak likelihood and magnitude

The likelihood to experience an outbreak can be provided by analyzing different stochastic occurrences of the epidemic with the same initial conditions, and by evaluating the probability that the infection will reach a given country. In the following we will consider statistics over 10^{3 }different realizations of the stochastic noise, and define the probability of outbreak in each country as the fraction of realizations that produced a positive number of cases within the country. This allows for the identification of areas at risk of infection, with a corresponding quantitative measure expressed by the outbreak probability. A more quantitative analysis is obtained by inspecting the predicted cumulative number of cases for each country, conditional to the occurrence of an outbreak in the country. The outbreak likelihood and magnitude analysis can be broken down at the level of single urban areas. In the following section we present an example of the results available at this resolution scale.

Predictability and epidemic pathways

The very high potential value of forecasting tools, in a planning perspective against emerging infectious diseases, points to the necessity of assessing the accuracy of such epidemic forecasts with respect to the various stochastic elements present in the process. Indeed, the present computational approach provides meaningful predictions only if all stochastic realizations of the epidemic, with the same initial conditions and parameters, are somehow similar in intensity, locations and time evolution. The airline network structure explicitly incorporated into the model is composed by more than 17 000 different connections among 3 100 cities. Such a large number of connections produce a huge amount of possible different paths available for the infection to spread throughout the world. This in principle could easily result in a set of simulated epidemic outbreaks that are very different one from the other – though starting from the same initial conditions – thus leading to a poor predictive power for the computational model. By contrast, while the airline network topology tends to lower the predictability of the disease evolution, the heterogeneity of the passenger volume on the various connections defines specific diffusion channels on the high traffic routes. Ultimately, the degree of predictability is determined by the competing effects of connectivity and traffic heterogeneities _{j}(_{j}(_{j}(_{l}_{l}. Following _{j}_{j}/ℵ, with ℵ = ∑_{j}_{j }being the world population. The overlap function Θ(

The overlap Θ(^{6 }passengers per year – associated with the air travel connections. In order to pinpoint the presence of epidemic pathways, starting from identical initial conditions, one can simulate different outbreaks subject to different realizations of the stochastic noise and obtain the time evolution of the epidemic in each urban area as described in the main text. During the simulations, one observes the propagation of the virus from one country to the other by means of the air travel and thus monitors the path followed by the infection at the country level. At each outbreak realization, it is possible to identify for each country _{i }the country _{j }origin of the infection and construct the graph of virus propagation; namely, if a latent or an infectious individuals travels from _{j }to _{i }and causes an outbreak in the country _{i }– not yet infected – a directed link from _{j }to _{i }is created with weight equal to 1. Once the origin of infection for _{i }has been identified, the following multiple introductions in _{i }are not considered as we are only interested in the path followed by the disease in infecting a geographical region not yet infected. After a statistically significant number of realizations, a directed weighted network is obtained in which the direction of a link indicates the direction of the virus diffusion and the weight represents the number of times this flow has been observed out of _{i }we renormalize to 1 the sum of the weights on all incoming links, in order to define the probability of infection on each flow. The network of epidemic pathways is then pruned by deleting all directed links having an occurrence probability less than a given threshold, in order to clearly identify the major pathways along which the epidemic will spread. This information identifies for each country the possible origins of infection and provides a quantitative estimation of the probability of receiving the infection from each identified origin. It is therefore information of crucial importance for the development and assessment of preparation plans of single countries. Travel advisories or limitations and medical screenings at the ports of entry – such as those put in place during SARS epidemic – might well strongly benefit from the analysis and identification of such epidemic pathways.

Results

As a concrete example of the previous modeling approach we analyze the specific case study of the SARS epidemic. Several mathematical models have been developed since the SARS coronavirus was identified (see ^{-1 }and for the scaling of the transmission rate

Parameter values

**Parameter**

**Description**

**Baseline value**

_{0}

Initial offset from 21 February (days)

3*

Rate of transmission

0.57*

Number of initial latent individuals

10*

21 February + _{0}-20 March

1.00

_{f}(

Scaling factor for the rate of transmission

21 March – 9 April

0.37

10 April – 11 July

0.06

_{
β
}

Relative infectiousness of patients at the hospital

0.2

^{-1}

Average latency period (days)

4.6

21 February + _{0}-25 March

4.84

^{-1}(

Average period from onset of symptoms to admission (days)

25 March – 1 April

3.83

2 April – 11 July

3.67

_{
R
}
^{-1}

Average period from admission to recovery (days)

23.5

_{
D
}
^{-1}

Average period from admission to death (days)

35.9

Case fatality rate

0.2

Baseline values for all epidemiological parameters and initial conditions. Parameters marked with an asterisk (*) are estimated by our model through the fitting procedure described in the main text. The three successive decreasing values for the ^{-1}model are the prompter, identification and subsequent isolation of infectious individuals [11]. A step function is also assumed for the scaling factor _{f}(_{t }with respect to _{0 }during the early stage of SARS epidemic in Hong Kong [9]. This corresponds to the effective reduction of the reproductive number due to the application of control measures [9].

Initial conditions are based on available evidence on the early stages of the outbreak and assume as index patient the first case detected out of mainland China, who arrived in Hong Kong on 21 February 2003 _{0 }days after 21 February with the index patient and _{0 }days after 21 February to 11 July 2003, date corresponding to the last daily update by the World Health Organization (WHO) on the cumulative number of reported probable cases of SARS

The values of the transmission rate _{0 }days are determined through a least square fit procedure to optimize the agreement of the stochastic simulation results with Hong Kong data. The advantage with respect to previous approaches is that no closed boundaries are imposed on Hong Kong, allowing for the mobility of individuals traveling in the city and for a decrease of the pool of infectious individuals who leave the city by means of air travel. The optimization gives the following baseline values: _{0 }= 3 days, where the errors reported for _{0 }= 2.76 – is in agreement with previous estimates

Outbreak likelihood

In Figure

Worldwide map representation of the outbreak likelihood as predicted by the stochastic model

**Worldwide map representation of the outbreak likelihood as predicted by the stochastic model**. Countries are represented according to the color code, ranging from gray for low outbreak probability to red for high outbreak probability.

Map representation of the outbreak likelihood within Canada at the urban area resolution scale

**Map representation of the outbreak likelihood within Canada at the urban area resolution scale**. Urban areas are represented according to the color code, ranging from gray for low outbreak probability to red for high outbreak probability. Airports within Canada are also shown.

To proceed further in the comparison with empirical data, we group countries in two categories according to a risk threshold in the outbreak occurrence probability. The no-risk countries are those where the probability of outbreak is lower than the risk threshold. In any other situation the country is defined at risk. In the following we set a risk threshold of 20%. Small variations of the risk threshold do not alter substantially the obtained results. In particular we show in Additional file

Forecasted number of cases for the countries with an incorrect prediction of outbreak

Country

**Median**

**90% CI**

Japan

83

23–228

United Arab Emirates

6

1–36

Bangladesh

6

1–42

Saudi Arabia

5

1–35

Netherlands

5

1–26

Cambodia

5

1–40

Bahrain

4

1–30

Austria

4

1–26

Denmark

3

1–15

Brunei

3

1–16

List of countries that were not infected according to WHO official reports but are predicted as at risk by numerical simulations. Median and 90% CI are reported; results correspond to 11 July 2003.

Map representation of the comparison between numerical results and WHO reported cases

**Map representation of the comparison between numerical results and WHO reported cases**. Countries are considered at risk if the probability of reporting an outbreak – computed on ^{3 }different realizations of the stochastic noise – is larger than 20%. In red we represent countries for which model forecasts are in agreement with WHO official reports, distinguishing between correct predictions of outbreak (filled red) and correct predictions of no outbreak (striped red). Forecasts that deviate from observed data are represented in green. Results shown refer to the date of 11 July 2003.

Outbreak magnitude

A more quantitative analysis is obtained by comparing the predicted cumulative number of cases for each country, conditional to the occurrence of an outbreak in the country, with the corresponding empirical data

Number of cases by country: comparison with WHO official reports

**Number of cases by country: comparison with WHO official reports**. Quantitative comparison of forecasted number of cases (conditional of the occurrence of an outbreak) with observed data. Simulated results are represented with a box plot in which lowest and highest values represent the 90% CI and the box is delimited by lower and upper quartile and reports the value of the median. Red symbols represent WHO official reports and are accompanied by the value of the number of cases for sake of clarity. (A) Agreement of model predictions with observed data: symbols are compatible with the model predictions. Broken scale and inset are used for sake of visualization. (B,C) Disagreement of model predictions with observed data: WHO data lie outside the 90% CI obtained from ^{3 }numerical simulations. Results are reported in two different plots characterized by two different scales for a better visualization.

Overlap and epidemic pathways

In order to test the predictability inherent to the model in the case of the SARS case study, Figure ^{3 }outbreaks starting from the same initial conditions. More precisely, starting from Hong Kong, we follow the propagation of the virus and identify for each infected country _{i }the country _{j }where the infection came from, thus defining a probability of origin of infection for each country. Results are reported in Figure

Overlap profile

**Overlap profile**. The value of the overlap is shown as a function of time, from the initial day of the simulations (21 February 2003) to 11 July 2003. Details on relevant events occurring during SARS epidemics are shown for reference.

Map representation of epidemic pathways

**Map representation of epidemic pathways**. Arrows show the paths followed by the virus in the transmission of the infection from Hong Kong to the other countries. The thickness of the arrows represents the probability associated to a given path, where all paths with probability less than 10% have been filtered out for sake of simplicity. Two different colors are used: black for paths that transmit the virus directly from the seed – Hong Kong – to the first level of infected countries; gray for paths that start from the first level of infected countries.

Discussion

While the results shown in Figures

It is worth also noting, however, that the countries for which forecasts underestimate the empirical data showed some peculiarities in the evolution of the SARS spread. Taiwan for instance experienced an anomalous outbreak explosion after a temporary failure in the infection containment procedures in a single hospital

Conclusion

The computational approach presented here is the largest scale epidemic model at the worldwide level. Its good agreement with historical data of the SARS epidemic suggests that the transportation and census data used here are the basic ingredients for the forecast and analysis of emerging disease spreading at the global level. A more detailed version of the model including the interplay of different transportation systems, information about the specific conditions experienced by each country and a refined compartmentalization to include variations in the susceptibility and heterogeneity in the infectiousness would clearly represent a further improvement in the

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

All authors conceived the study, collected data and performed experiments for the study, analyzed results and contributed to writing the paper. All authors read and approved the final manuscript.

Acknowledgements

The authors thank the International Air Transport Association for making the commercial airline database available. AB and AV are partially funded by the European Commission-contract 001907 (DELIS). AV is partially funded by the NSF award IIS-0513650.

Pre-publication history

The pre-publication history for this paper can be accessed here: