This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010
FluReF, an automated flu virus reassortment finder based on phylogenetic trees
Department of Computer Science, EPFL (Swiss Federal Institute of Technology), Lausanne, CH-1015, Switzerland
BMC Genomics 2011, 12(Suppl 2):S3 doi:10.1186/1471-2164-12-S2-S3Published: 27 July 2011
Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done “by hand” on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons.
We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005–2008 and gave partial verification of reassortments found using another bioinformatics method.
FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades—groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees.
FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html webcite.