This article is part of the supplement: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011)
A minimal descriptor of an ancestral recombinations graph
1 Computational Genomics, IBM T J Watson Research, Yorktown, New York, USA
2 Columbia University, New York, USA
3 The work done during an internship at IBM T J Watson Research Center
BMC Bioinformatics 2011, 12(Suppl 1):S6 doi:10.1186/1471-2105-12-S1-S6Published: 15 February 2011
Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.
In this paper, we identify structure-preserving and samples-preserving core of an ARG G and call it the minimal descriptor ARG of G. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of G and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of G. We also prove that even an unbounded G has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of G and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.
Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).