Unsupervised clustering of wildlife necropsy data for syndromic surveillance
1 Laboratory Environment and Prediction of Population Health, VetAgro Sup, Veterinary campus of Lyon, 1 avenue Bourgelat, BP 83, F-69280 Marcy-l'Etoile, France
2 Epidemiology Unit, French Agency for Food, Environmental and Occupational Safety, 31 avenue Tony Garnier, F-69364 Lyon Cedex 07, France
BMC Veterinary Research 2010, 6:56 doi:10.1186/1746-6148-6-56Published: 16 December 2010
The importance of wildlife disease surveillance is increasing, because wild animals are playing a growing role as sources of emerging infectious disease events in humans. Syndromic surveillance methods have been developed as a complement to traditional health data analyses, to allow the early detection of unusual health events. Early detection of these events in wildlife could help to protect the health of domestic animals or humans. This paper aims to define syndromes that could be used for the syndromic surveillance of wildlife health data. Wildlife disease monitoring in France, from 1986 onward, has allowed numerous diagnostic data to be collected from wild animals found dead. The authors wanted to identify distinct pathological profiles from these historical data by a global analysis of the registered necropsy descriptions, and discuss how these profiles can be used to define syndromes. In view of the multiplicity and heterogeneity of the available information, the authors suggest constructing syndromic classes by a multivariate statistical analysis and classification procedure grouping cases that share similar pathological characteristics.
A three-step procedure was applied: first, a multiple correspondence analysis was performed on necropsy data to reduce them to their principal components. Then hierarchical ascendant clustering was used to partition the data. Finally the k-means algorithm was applied to strengthen the partitioning. Nine clusters were identified: three were species- and disease-specific, three were suggestive of specific pathological conditions but not species-specific, two covered a broader pathological condition and one was miscellaneous. The clusters reflected the most distinct and most frequent disease entities on which the surveillance network focused. They could be used to define distinct syndromes characterised by specific post-mortem findings.
The chosen statistical clustering method was found to be a useful tool to retrospectively group cases from our database into distinct and meaningful pathological entities. Syndrome definition from post-mortem findings is potentially useful for early outbreak detection because it uses the earliest available information on disease in wildlife. Furthermore, the proposed typology allows each case to be attributed to a syndrome, thus enabling the exhaustive surveillance of health events through time series analyses.