Boosting forward-time population genetic simulators through genotype compression
Department of Computer Science, Rice University, Houston, USA
BMC Bioinformatics 2013, 14:192 doi:10.1186/1471-2105-14-192Published: 14 June 2013
Forward-time population genetic simulations play a central role in deriving and testing evolutionary hypotheses. Such simulations may be data-intensive, depending on the settings to the various parameters controlling them. In particular, for certain settings, the data footprint may quickly exceed the memory of a single compute node.
We develop a novel and general method for addressing the memory issue inherent in forward-time simulations by compressing and decompressing, in real-time, active and ancestral genotypes, while carefully accounting for the time overhead. We propose a general graph data structure for compressing the genotype space explored during a simulation run, along with efficient algorithms for constructing and updating compressed genotypes which support both mutation and recombination. We tested the performance of our method in very large-scale simulations. Results show that our method not only scales well, but that it also overcomes memory issues that would cripple existing tools.
As evolutionary analyses are being increasingly performed on genomes, pathways, and networks, particularly in the era of systems biology, scaling population genetic simulators to handle large-scale simulations is crucial. We believe our method offers a significant step in that direction. Further, the techniques we provide are generic and can be integrated with existing population genetic simulators to boost their performance in terms of memory usage.