Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics

Open Access Proceedings

From event-labeled gene trees to species trees

Maribel Hernandez-Rosales12*, Marc Hellmuth3, Nicolas Wieseke45, Katharina T Huber6, Vincent Moulton6 and Peter F Stadler1278

Author Affiliations

1 Max-Planck-Institute for Mathematics in the Sciences, Leipzig, D-04103, Germany

2 Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, D-04107, Germany

3 Center for Bioinformatics, Saarland University, Saarbrücken, D-66041, Germany

4 High Throughput Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller Universität Jena, Jena, D-07743, Germany

5 Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Leipzig, D04103, Germany

6 School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK

7 Inst. f. Theoretical Chemistry, University of Vienna, Vienna, A-1090, Austria

8 Santa Fe Institute, Santa Fe, NM, 87501, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13(Suppl 19):S6  doi:10.1186/1471-2105-13-S19-S6

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/13/S19/S6


Published:19 December 2012

© 2012 Hernandez-Rosales et al.; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Tree reconciliation problems have long been studied in phylogenetics. A particular variant of the reconciliation problem for a gene tree T and a species tree S assumes that for each interior vertex x of T it is known whether x represents a speciation or a duplication. This problem appears in the context of analyzing orthology data.

Results

We show that S is a species tree for T if and only if S displays all rooted triples of T that have three distinct species as their leaves and are rooted in a speciation vertex. A valid reconciliation map can then be found in polynomial time. Simulated data shows that the event-labeled gene trees convey a large amount of information on underlying species trees, even for a large percentage of losses.

Conclusions

The knowledge of event labels in a gene tree strongly constrains the possible species tree and, for a given species tree, also the possible reconciliation maps. Nevertheless, many degrees of freedom remain in the space of feasible solutions. In order to disambiguate the alternative solutions additional external constraints as well as optimization criteria could be employed.

Background

The reconstruction of the evolutionary history of a gene family is necessarily based on at least three interrelated types of information. The true phylogeny of the investigated species is required as a scaffold with which the associated gene tree must be reconcilable. Orthology or paralogy of genes found in different species determines whether an internal vertex in the gene tree corresponds to a duplication or a speciation event. Speciation events, in turn, are reflected in the species tree.

The reconciliation of gene and species trees is a widely studied problem [1-10]. In most practical applications, however, neither the gene tree nor the species tree can be determined unambiguously.

Although orthology information is often derived from the reconciliation of a gene tree with a species tree (cf. e.g. TreeFam [11], PhyOP [12], PHOG [13], EnsemblCompara GeneTrees [14], and MetaPhOrs [15]), recent benchmarks studies [16] have shown that orthology can also be inferred at similar levels of accuracy without the need to construct trees by means of clustering-based approaches such as OrthoMCL [17], the algorithms underlying the COG database [18,19], InParanoid [20], or ProteinOrtho [21]. In [22] we have therefore addressed the question: how much information about the gene tree, the species tree, and their reconciliation is already contained in the orthology relation between genes?

According to Fitch's definition [23], two genes are (co-)orthologous if their last common ancestor in the gene tree represents a speciation event. Otherwise, i.e., when their last common ancestor is a duplication event, they are paralogs. The orthology relation on a set of genes is therefore determined by the gene tree T and an "event labeling" that identifies each interior vertex of T as either a duplication or a speciation event. (We disregard here additional types of events such as horizontal transfer and refer to [22] for details on how such extensions might be incorporated into the mathematical framework.) One of the main results of [22], which relies on the theory of symbolic ultrametrics developed in [24], is the following: a relation on a set of genes is an orthology relation (i.e., it derives from some event-labeled gene tree) if and only if it is a cograph (for several equivalent characterizations of cographs see [25]). Note that the cograph does not contain the full information on the event-labeled gene tree. Instead the cograph is equivalent to the gene tree's homomorphic image obtained by collapsing adjacent events of the same type [22]. The orthology relation thus places strong and easily interpretable constraints on the gene tree.

This observation suggests that a viable approach to reconstructing histories of large gene families may start from an empirically determined orthology relation, which can be directly adjusted to conform to the requirement of being a cograph. The result is then equivalent to an (usually incompletely resolved) event-labeled gene tree, which might be refined or used as constraint in the inference of a fully resolved gene tree. In this contribution we are concerned with the next conceptual step: the derivation of a species tree from an event-labeled gene tree. As we shall see below, this problem is much simpler than the full tree reconciliation problem. Technically, we will approach this problem by reducing the reconciliation map from gene tree to species tree to rooted triples of genes residing in three distinct species. This is related to an approach that was developed in [26] for addressing the full tree reconciliation problem.

Methods

Definitions and notation

Phylogenetic trees

A phylogenetic tree T (on L) is a rooted tree T = (V, E), with leaf set L V , set of directed edges E, and set of interior vertices V0 = V\L that does not contain any vertices with in- and outdegree one and whose root ρT V has indegree zero. In order to avoid uninteresting trivial cases, we assume that |L| ≥ 3. The ancestor relation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M1">View MathML</a> on V is the partial order defined, for all x, y V , by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M114','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M114">View MathML</a> whenever y lies on the path from x to the root. If there is no danger of ambiguity, we will write <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M115','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M115">View MathML</a> rather than <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M114','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M114">View MathML</a>. Furthermore, we write <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M2">View MathML</a> to mean <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M115','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M115">View MathML</a> and x y. For x V , we write <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M3">View MathML</a> for the set of leaves in the subtree T (x) of T rooted in x. Thus, L(ρT ) = L and T (ρT ) = T . For x, y V such that x and y are joined by an edge e E we write <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M4">View MathML</a>. Two phylogenetic trees T = (V, E) and T' = (V', E') on L are said to be equivalent if there exists a bijection from V to V' that is the identity on L, maps ρT to ρT', and extends to a graph isomorphism between T and T '. A refinement of a phylogenetic tree T on L is a phylogenetic tree T' on L such that T can be obtained from T' by collapsing edges (see e.g. [27]). Suppose for the remainder of this section that T = (V, E) is a phylogenetic tree on L with root ρT . For a non-empty subset of leaves A L, we define lcaT (A), or the most recent common ancestor of A, to be the unique vertex in T that is the greatest lower bound of A under the partial order <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M5">View MathML</a>. In case A = {x, y}, we put lcaT (x, y) := lcaT ({x, y}) and if A = {x, y, z}, we put lcaT (x, y, z) := lcaT ({x, y, z}). For later reference, we have, for all x V , that x = lcaT (L(x)). Let L' L be a subset of |L'| ≥ 2 leaves of T. We denote by T (L') = T (lcaT (L')) the (rooted) subtree of T with root lcaT (L'). Note that T(L') may have leaves that are not contained in L'. The restriction <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M6">View MathML</a> of T to L' is the phylogenetic tree with leaf set L' obtained from T by first forming the minimal spanning tree in T with leaf set L' and then by suppressing all vertices of degree two with the exception of ρT if ρT is a vertex of that tree. A phylogenetic tree T' on some subset L' ⊆ L is said to be displayed by T (or equivalently that T displays T') if T' is equivalent with tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M7">View MathML</a>. A set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8">View MathML</a> of phylogenetic trees T each with leaf set LT is called consistent if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M9">View MathML</a> or there is a phylogenetic tree T on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M10">View MathML</a> that displays <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8">View MathML</a>, that is, T displays every tree contained in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M8">View MathML</a>. Note that a consistent set of phylogenetic trees is sometimes also called compatible (see e.g. [27]).

It will be convenient for our discussion below to extend the ancestor relation <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M11">View MathML</a> on V to the union of the edge and vertex sets of T. More precisely, for the directed edge e = [u, v] ∈ E we put <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M12">View MathML</a> if and onfly if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M13">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M14">View MathML</a> if and only if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M15">View MathML</a>. For edges e = [u, v] and f = [a, b] in T we put <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M16">View MathML</a> if and only if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M17">View MathML</a>.

Rooted triples

Rooted triples are phylogenetic trees on three leaves with precisely two interior vertices. Sometimes also called rooted triplets [28] they constitute an important concept in the context of supertree reconstruction [27,29] and will also play a major role here. Suppose L = {x, y, z}. Then we denote by ((x, y), z) the triple r with leaf set L for which the path from x to y does not intersect the path from z to the root ρr and thus, having. <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M18">View MathML</a> For T a phylogenetic tree, we denote by <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M19">View MathML</a> the set of all triples that are displayed by T .

Clearly, a set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> of triples is consistent if there is a phylogenetic tree T on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M21">View MathML</a> such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M22">View MathML</a>. Not all sets of triples are consistent of course. Given a triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> there is a polynomial-time algorithm, referred to in [27] as BUILD, that either constructs a phylogenetic tree T that displays <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> or that recognizes that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> is inconsistent, that is, not consistent [30]. Various practical implementations have been described starting with [30], improved variants are discussed in [31,32].

The problem of determining a maximum consistent subset <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M23">View MathML</a> of an inconsistent set of triples, on the other hand is NP-hard and also APX-hard, see [33,34] and the references therein. We refer to [35] for an overview on the available practical approaches and further theoretical results.

The BUILD algorithm, furthermore, does not necessarily generate for a given triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> a minimal phylogenetic tree T that displays <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a>, i.e., T may resolve multifurcations in an arbitrary way that is not implied by any of the triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a>. However, the tree generated by BUILD is minor-minimal, i.e., if T' is obtained from T by contracting an edge, T' does not display <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> anymore. The trees produced by BUILD do not necessarily have the minimum number of internal vertices. Thus, depending on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a>, not all trees consistent with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> can be obtained from BUILD. Semple [36] gives an algorithm that produces all minor-minimal trees consistent with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a>. It requires only polynomial time for each of the possibly exponentially many minor-minimal trees. The problem of constructing a tree consistent with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M20">View MathML</a> and minimizing the number of interior vertices is NP-hard and hard to approximate [37].

Event labeling, species labeling, and reconciliation map

A gene tree T arises through a series of events along a species tree S. We consider both T and S as phylogenetic trees with leaf sets L (the set of genes) and B (the set of species), respectively. We assume that |L| ≥ 3 and |B| ≥ 1. We consider only gene duplications and gene losses, which take place between speciation events, i.e., along the edges of S. Speciation events are modeled by transmitting the gene content of an ancestral lineage to each of its daughter lineages.

The true evolutionary history of a single ancestral gene thus can be thought of as a scenario such as the one depicted in Figure 1. Since we do not consider horizontal gene transfer or lineage sorting in this contribution, an evolutionary scenario consists of four components: (1) A true gene tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a>, (2) a true species tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a>, (3) an assignment of an event type (i.e., speciation •, duplication □, loss ⊗, or observable (extant) gene ⊙) to each interior vertex and leaf of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a>, and (4) a map µ assigning every vertex of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> to a vertex or edge of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a> in such a way that (a) the ancestor order of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> is preserved, (b) a vertex of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> is mapped to an interior vertex of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a> if and only if it is of type speciation, (c) extant genes of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> are mapped to leaves of S. Alternatively, one could define <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a> to be metric graphs (i.e., comprising edges that are real intervals glued together at the vertices) with a distance function that measures evolutionary time. In this picture, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26">View MathML</a> is a continuous map that preserves the temporal order and satisfied conditions (b) and (c).

thumbnailFigure 1. Gene trees. Left: Example of an evolutionary scenario showing the evolution of a gene family. The corresponding true gene tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> appears embedded in the true species tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a>. The map <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26">View MathML</a> is implicitly given by drawing the species tree superimposed on the gene tree. In particular, the speciation vertices in the gene tree (red circuits) are mapped to the vertices of the species tree (gray ovals) and the duplication vertices (blue squares) to the edges of the species tree. Gene losses are represented with "⊗" (mapping to edges in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a>). The observable species a b,..., f are the leaves of the species tree (green ovals) and extant genes therein are labeled with "⊙". Right: The corresponding gene tree T with observed events from the left tree. Leaves are labeled with the corresponding species.

In order to allow <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26">View MathML</a> to map duplication vertices to a time point before the last common ancestor of all species in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a>, we need to extend our definition of a species tree by adding an extra vertex and an extra edge "above" the last common ancestor of all species. Note that strictly speaking <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a> is not a phylogenetic tree anymore. In case there is no danger of confusion, we will from now on refer to a phylogenetic tree on B with this extra edge and vertex added as a species tree on B and to ρB as the root of B. Also, we canonically extend our notions of a triple, displaying, etc. to this new type of species tree.

The true gene tree <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> represents all extant as well as all extinct genes, all duplication, and all speciation events. Not all of these events are observable from extant genes data, however. In particular, extinct genes cannot be observed. The observable part T = T (V, E) of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> is the restriction of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> to the leaf set L of extant genes, i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M27">View MathML</a>.

Furthermore, we can observe a map σ: L B that assigns to each extant gene the species in which it resides. Of course, for x L we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M28">View MathML</a>. Here B is the leaf set of the extant species tree, i.e., B = σ(L). For ease of readability, we also put σ(T') = {σ(x): x L(y)} for any subtree T' of T with T' = T (y) where y . Alternatively, we will sometimes also write σ(y) instead of σ(T (y)). Last but not least, for Y L, we put σ(Y ) = {σ(y): y Y}.

The observable part of the species tree S = (W H) is the restriction <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M29">View MathML</a> of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a> to B. In order to account for duplication events that occurred before the first speciation event, the additional vertex ρS W and the additional edge [ρs, lcasB] ∈ H must be part of S.

The evolutionary scenario also implies an event labeling map <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M30">View MathML</a> that assigns to each interior vertex v of T a value t(v) indicating whether v is a speciation event (·) or a duplication event (□). It is convenient to use the special label ⊙ for the leaves x of T . We write (T,t) for the event-labeled tree. We remark that t was introduced as "symbolic dating map" in [24]. It is called discriminating if, for all edges {u, v} E, we have t(u) ≠ t(v) in which case (T,t) is known to be in 1-1-correspondence to a cograph [22]. Note that we will in general not require that t is discriminating in this contribution. For T = (V, E) a gene tree on L, B a set of species, and maps t and σ as specified above, we require however that µ and σ must satisfy the following compatibility property:

(C) Let z V be a speciation vertex, i.e., t(z) = ·, and let T' and T" be subtrees of T rooted in two distinct children of z. Then σ (T') σ (T") = ∅.

Note the we do not require the converse, i.e., from the disjointness of the species sets σ (T') and σ(T") we do not conclude that their last common ancestor is a speciation vertex.

For x, y L and z = lcaT (x, y) it immediately follows from condition (C) that if t(lcaT (x, y)) = then σ(x) ≠ σ(y) since, by assumption, x and y are leaves in distinct subtrees below z. Equivalently, two distinct genes x y in L for which σ(x) = σ(y) holds, that is, they are contained in the same species of B, must have originated from a duplication event, i.e., t(lcaT (x, y)) = □. Thus we can regard σ as a proper vertex coloring of the cograph corresponding to (T, t).

Let us now consider the properties of the restriction of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26">View MathML</a> to the observable parts T of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> and S of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M25">View MathML</a>. Consider a speciation vertex x in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a>. If x has two children y' and y" so that L(y') and L(y") are both non-empty then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M34">View MathML</a> for all z' ∈ L(y') and z" L(y") and hence, x = lcaT (L(y')∪(L(y")). In particular, x is an observable vertex in T. Furthermore, we know that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M35">View MathML</a>, and therefore,<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M36">View MathML</a>. Considering all pairs of children with this property this can be rephrased as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M37">View MathML</a>. On the other hand, if x does not have at least two children with this property, and hence the corresponding speciation vertex cannot be viewed as most recent common ancestor of the set of its descendants in S, then x is not a vertex in the restriction <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M38">View MathML</a> of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M24">View MathML</a> to the set L of the extant genes. The restriction µ of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M26">View MathML</a> to the observable tree T therefore satisfies the properties used below to define reconciliation maps.

Definition 1. Suppose that B is a set of species, that S = (W, H) is a phylogenetic tree on B, that T = (V, E) is a gene tree with leaf set L and that σ : L B and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M39">View MathML</a>are the maps described above. Then we say that S is a species tree for (T,t, σ) if there is a map µ : V W H such that, for all x V:

(i) If <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M40">View MathML</a>then µ (x) = σ (x).

(ii) If t(x) = • then µ (x) ∈W \ B.

(iii) If t(x) = □ then µ(x) ∈ H.

(iv) Let x, y V with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M41">View MathML</a>. We distinguish two cases:

1. If t(x) = t(y) = □ then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M42">View MathML</a>in S.

2. If t(x) = t(y) = • or t(x) ≠ t(y) then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M43">View MathML</a>in S.

(v) If t(x) = • then µ(x) = lcaS(σ(L(x)))

We call µ the reconciliation map from (T,t, σ ) to S.

We note that µ-1(ρS) = ∅ holds as an immediate consequence of property (v), which implies that no speciation node can be mapped above lcaS(B), the unique child of ρS.

We illustrate this definition by means of an example in Figure 2 and remark that it is consistent with the definition of reconciliation maps for the case when the event labeling t on T is not known [38]. Continuing with our notation from Definition 1 for the remainder of this section, we easily derive their axiom set as

thumbnailFigure 2. Mapping μ. Example of the mapping μ of nodes of the gene tree T to the species tree S. Speciation nodes in the gene tree (red circles) are mapped to nodes in the species tree, duplication nodes (blue squares) are mapped to edges in the species tree. σ is shown as dashed green arrows. For clarity of exposition, we have identified the leaves of the gene tree on the left with the species they reside in via the map σ.

Lemma 2. If µ is a reconciliation map from (T,t, σ) to S and L is the leaf set of T then, for all x V.

(D1) x L implies µ (x) = σ (x).

(D2.a) µ(x) ∈ W implies µ (x) = lcaS(σ (L(x))).

(D2.b) µ (x) ∈ H implies <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M44">View MathML</a>.

(D3) Suppose x, y V such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M45">View MathML</a>. If µ (x), µ (y) ∈ H then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M46">View MathML</a>; otherwise <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M47">View MathML</a>.

Proof. Suppose x V. Then (D1) is equivalent to (i) and the fact that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M48">View MathML</a> if and only if x L. Conditions (ii) and (v) together imply (D2.a). If µ (x) ∈ H then x is duplication vertex of T. From condition (iv) we conclude that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M49">View MathML</a>. Since lcaS(σ(L(x))) ∈ W, equality cannot hold and so (D2.b) follows. (D3) is an immediate consequence of (iv). □

For T a gene tree, B a set of species and maps σ and t as above, our goal is now to characterize (1) those (T,t, σ) for which a species tree on B exists and (2) species trees on B that are species trees for (T,t, σ).

Results and discussion

Results

Unless stated otherwise, we continue with our assumptions on B, (T,t, σ), and S as stated in Definition 1. We start with the simple observation that a reconciliation map from (T,t, σ) to S preserves the ancestor order of T and hence T imposes a strong constraint on the relationship of most recent common ancestors in S:

Lemma 3. Let µ : V W H be a reconciliation map from (T,t, σ) to S. Then

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M50">View MathML</a>

(1)

holds for all x, y V.

Proof. Assume that x and y are distinct vertices of T. Consider the unique path P connecting x with y. P is uniquely subdivided into a path P' from x to lcaT (x, y) and a path P" from lcaT (x, y) to y. Condition (iv) implies that the images of the vertices of P' and P" under µ, resp., are ordered in S with regards to <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M51">View MathML</a> and hence are contained in the intervals <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M52">View MathML</a>' and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M53">View MathML</a> that connect µ(lcaT (x, y)) with µ(x) and µ(y), respectively. In particular µ(lcaT (x, y)) is the largest element <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M54">View MathML</a> in the union of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M55">View MathML</a> which contains the unique path from µ(x) to µ(y) and hence also lcaS(µ(x), µ(y)).   □

Equation (1) is well known to hold for gene tree/species reconciliation in the absence of a prescribed event labeling in T.

Since a phylogenetic tree (in the original sense) T is uniquely determined by its induced triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56">View MathML</a>, it is reasonable to expect that all the information on the species tree(s) for (T,t, σ) is contained in the images of the triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56">View MathML</a> (or more precisely their leaves) under σ. However, this is not the case in general as the situation is complicated by the fact that not all triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M56">View MathML</a> are informative about a species tree that displays T. The reason is that duplications may generate distinct paralogs long before the divergence of the species in which they eventually appear. To address this problem, we associate to (T,t, σ) the set of triples

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M57','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M57">View MathML</a>

(2)

As we shall see below, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M58','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M58">View MathML</a> contains all the information on a species tree for (T,t, σ) that can be gleaned from (T,t, σ).

Lemma 4. If µ is a reconciliation map from (T,t, σ) to S and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M59','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M59">View MathML</a>then S displays ((σ(x), σ(y)), σ(z)).

Proof. Put <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M60','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M60">View MathML</a> and recall that L denotes the leaf set of T. Let <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M61">View MathML</a> and assume w.l.o.g. that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M62">View MathML</a>. First consider the case that t(lcaT (x, y)) = •. From condition (v) we conclude that µ(lcaT (x, y)) = lcaS(σ(x), σ(y)) and µ(lcaT (x, y, z)) = lcaS(σ(x), σ(y), σ(z)). Since, by assumption, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M63','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M63">View MathML</a>, we have as a consequence of condition (iv) that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M64">View MathML</a>. From lcaT (x, z) = lcaT (y, z) = lcaT (x, y, z) we conclude that S must display ((σ (x), σ(y)), σ(z)) as S is assumed to be a species tree for (T,t, σ).

Now suppose that t(lcaT (x, y)) = □ and therefore, µ (lcaT (x, y)) ∈ H. Moreover, µ (lcaT (x, y, z)) ∈ W holds. Hence, Lemma 3 and property (iv) together imply that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M65">View MathML</a>. Thus, we again obtain that the triple ((σ(x), σ(y)), σ(z)) is displayed by S. □

It is important to note that a similar argument cannot be made for triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M66','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M66">View MathML</a> rooted in a duplication vertex of T as such triplets are in general not displayed by a species tree for (T,t, σ). We present the generic counterexample in Figure 3. To state our main result (Theorem 6), we require a further definition.

thumbnailFigure 3. Triples with duplication event at the root. Triples from T whose root is a duplication event are in general not displayed from the species tree S. (a) Triple with duplication event at the root obtained from the true evolutionary history of T shown in panel (b). Panel (c) is the true species tree. In the triple (a) the species y appears as the outgroup even though the x is the outgroup in the true species tree.

Definition 5. For (T,t, σ), we define the set

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M67','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M67">View MathML</a>

(3)

As an immediate consequence of Lemma 4, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M68','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M68">View MathML</a> must be displayed by any species tree for (T,t, σ) with leaf set B.

Theorem 6. Let S be a species tree with leaf set B. Then there exists a reconciliation map µ from (T,t, σ) to S whenever S displays all triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M69','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M69">View MathML</a>.

Proof. Recall that L is the leaf set of T = (V, E). Put S = (W, H) and <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M70','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M70">View MathML</a>. We first consider the subset <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M71','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M71">View MathML</a> of V comprising of the leaves and speciation vertices of T.

We explicitly construct the map µ : G W as follows. For all x V , we put

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M72','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M72">View MathML</a>,

(M2) µ(x) = lcaS(σ(L(x))) if t(x) = •.

Note that alternative (M1) ensures that µ satisfies Condition (i). Also note that in view of the simple consequence following the statement of Condition (C) we have for all x V with t(x) = • that there are leaves y', y" ∈ L(x) with σ(y') ≠ σ(y"). Thus lcaS(µ(L(x)) ∈ W \ B, i.e. µ satisfies Condition (ii). Also note that, by definition, alternative (M2) ensures that µ satisfies Condition (v).

Claim: If x, y G with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M73','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M73">View MathML</a> then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M74','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M74">View MathML</a>.

Since y cannot be a leaf of T as <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M75','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M75">View MathML</a> we have t(y) = •. There are two cases to consider, either t(x) = • or <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M76','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M76">View MathML</a>. In the latter case µ(x) = σ(x) ∈ B while µ(y) ∈ W \ B as argued above. Since x L(y) we have <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M77','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M77">View MathML</a>, as desired.

Now suppose t(x) = •. Again by the simple consequence following Condition (C), there are leaves x', x" ∈ L(x) with a = σ(x') ≠ σ(x") = b. Since <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M78','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M78">View MathML</a> and t(y) = •, by Condition (C), we conclude that c = σ(y') ∉ σ(L(x)) holds for all y' ∈ L(y) \ L(x). Thus,<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M79','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M79">View MathML</a>. But then ((a, b), c) is displayed by S and therefore <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M80','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M80">View MathML</a>. Since this holds for all triples <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M81','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M81">View MathML</a> with x', x" ∈ L(x) and y' ∈ L(y) \ L(x) we conclude <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M82','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M82">View MathML</a> establishing the claim. It follows immediately that µ also satisfies Condition (iv.2) if x and y are contained in G.

Next, we extend the map µ to the entire vertex set V of T using the following observation. Let x V with t(x) = □. We know by Lemma 3 that µ(x) is an edge [u, v] ∈ H so that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M83','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M83">View MathML</a>. Such an edge exists for v = lcaS(σ(L(x))) by construction. Every speciation vertex y V with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M84','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M84">View MathML</a> therefore necessarily maps above this edge, i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M85','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M85">View MathML</a> must hold. Thus we set

(M3) µ(x) = [u, lcaS(σ(L(x)))] if t(x) = □.

which now makes μ a map from V to W H.

By construction, Conditions (iii), (iv.2) and (v) are thus satisfied by μ. On the other hand, if there is a speciation vertex y between two duplication vertices x and x' of T , i.e., <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M86','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M86">View MathML</a>, then <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M87','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M87">View MathML</a>. Thus μ also satisfies Condition (iv.1).

It follows that μ is a reconciliation map from (T,t, σ) to S. □

Corollary 7. Suppose that S is a species tree for (T,t, σ) and that L and B are the leaf sets of T and S, respectively. Then a reconciliation map μ from (T,t, σ) to S can be constructed in O(|L||B|).

Proof. In order to find the image of an interior vertex x of T under μ, it suffices to determine σ (L(x)) (which can be done for all x simultaneously, e.g. by bottom up transversal of T in O(|L||B|) time) and lcaS(σ(L(x))). The latter task can be solved in linear time using the idea presented in [39] to calculate the lowest common ancestor for a group of nodes in the species tree. □

We remark that given a species tree S on B that displays all triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M88">View MathML</a>, there is no freedom in the construction of a reconciliation map on the set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M89">View MathML</a>. The duplication vertices of T, however, can be placed differently, resulting in possibly exponentially many reconciliation maps from (T,t, σ) to S.

Lemma 4 implies that consistency of the triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M90','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M90">View MathML</a> is necessary for the existence of a reconciliation map from (T,t, σ) to a species tree on B. Theorem 6, on the other hand, establishes that this is also sufficient. Thus, we have

Theorem 8. There is a species tree on B for (T,t, σ) if and only if the triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M91','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M91">View MathML</a>is consistent.

We remark that a related result is proven in [26, Theorem.5] for the full tree reconciliation problem starting from a forest of gene trees.

It may be surprising that there are no strong restrictions on the set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M92','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M92">View MathML</a> of triples that are implied by the fact that they are derived from a gene tree (T,t, σ).

Theorem 9. For every set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M116">View MathML</a>of triples on some finite set B of size at least one there is a gen e tree T = (V, E) with leaf set L together with an event map <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M93','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M93">View MathML</a>and a map σ : L B that assigns to every leaf of T the species in B it resides in, such that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M94','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M94">View MathML</a>.

Proof. Irrespective of whether <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M116">View MathML</a> is consistent or not we construct the components of the required 3-tuple (T,t, σ) as follows: To each triple <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M95','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M95">View MathML</a> we associate a triple <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M96">View MathML</a> via a map <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M97">View MathML</a> with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M98">View MathML</a> for i = 1, 2, 3 where we assume that for any two distinct triples <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M99">View MathML</a> we have that σk(Lk) ∩ σl(Ll ) = <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M100','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M100">View MathML</a>. Then we obtain T = (V, E) by first adding a single new vertex ρT to the union of the vertex sets of the triples Tk and then connecting ρT to the root ρk of each of the triples Tk. Clearly, T is a phylogenetic tree on <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M101">View MathML</a>. Next, we define the map <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M102">View MathML</a> by putting t(ρT ) = □, <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M103','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M103">View MathML</a> for all a L and t(a) = • for all a V (L {ρ T }). Finally, we define the map σ : L B by putting, for all a L, σ(a) = σk (a) where a Lk. Clearly <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M104','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M104">View MathML</a>.    □

We remark that the gene tree constructed in the proof of Theorem 9 can be made into a binary tree by splitting the root ρT into a series of duplication and loss events so that each subtree is the descendant of a different paralog. Since by Theorem. 9 there are no restrictions on the possible triple sets <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M105','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M105">View MathML</a>, it is clear that S will in general not be unique. An example is shown in Figure 4.

thumbnailFigure 4. Inferred species trees. The set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M112','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M112">View MathML</a> inferred from the event labeled gene tree (T,t, σ) does not necessarily define a unique species tree. For clarity of exposition, we have identified, via the map σ, the leaves of the gene tree and of the set of triples <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M113','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M113">View MathML</a> with the species they reside in.

Results for simulated gene trees

In order to determine empirically how much information on the species tree we can hope to find in event labeled gene trees, we simulated species trees together with corresponding event-labeled gene trees with different duplication and loss rates. Approximately 150 species trees with 10 to 100 species were generated according to the "age model" [40]. These trees are balanced and the edge lengths are normalized so that the total length of the path from the root to each leaf is 1. For each species tree, we then simulated a gene tree as described in [41], with duplication and loss rate parameters r ∈ 0[1] sampled uniformly. Events are modeled by a Poisson distribution with parameter r · , where is the length of an edge as generated by the age model. Losses were additionally constrained to retain at least one copy in each species, i.e., σ(L) = B is enforced. After determining the triple set <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M106','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M106">View MathML</a> according to Theorem 6, we used BUILD [27] (see also [42]) to compute the species tree. In all cases BUILD returns a tree that is a homomorphic contraction of the simulated species tree. The difference between the original and the reconstructed species tree is thus conveniently quantified as the difference in the number of interior vertices. Note that in our situation this is the same as the split metric [27].

The results are summarized in Figure 5. Not surprisingly, the recoverable information decreases in particular with the rate of gene loss. Nevertheless, at least 50% of the splits in the species tree are recoverable even at very high loss rates. For moderate loss rates, in particular when gene losses are less frequent than gene duplications, nearly the complete information on the species tree is preserved. It is interesting to note that BUILD does not incorporate splits that are not present in the input tree, although this is not mathematically guaranteed.

thumbnailFigure 5. Recovered splits in species trees. Left: Heat map that represents the percentage of recovered splits in the inferred species tree from triples obtained from simulated event-labeled gene trees with different loss and duplication rates. Right: Scattergram that shows the average of losses and duplications in the generated data and the accuracy of the inferred species tree.

Discussion

Event-labeled gene trees can be obtained by combining the reconstruction of gene phylogenies with methods for orthology detection. Orthology alone already encapsulates partial information on the gene tree. More precisely, the orthology relation is equivalent to a homomorphic image of the gene tree in which adjacent vertices denote different types of events. We discussed here the properties of reconciliation maps μ from a gene tree T along with an event labelling map t and a gene to species assignment map σ to a species tree S. We show that (T,t) event labeled gene trees for which a species tree exists can be characterized in terms of the set σ of triples that is easily constructed from a subset of triples of T. Simulated data shows, furthermore, that such trees convey a large amount of information on the underlying species tree, even if the gene loss rate is high.

It can be expected that for real-life data the tree T contains errors so that <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M107','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M107">View MathML</a> may not be consistent. In this case, an approximation to the species tree could be obtained e.g. from a maximum consistent subset of <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M108','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M108">View MathML</a>. Although (the decision version of) this problem is NP-complete [43,44], there is a wide variety of practically applicable algorithms for this task, see [35,45]. Even if <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M109','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M109">View MathML</a> is consistent, the species tree is usually not uniquely determined. Algorithms to list all trees consistent with <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M110">View MathML</a> can be found e.g. in [46,47]. A characterization of triple sets that determine a unique tree can be found in [48]. Since our main interest is to determine the constraints imposed by (T,t, σ) on the species tree S, we are interested in a least resolved tree S that displays all triples in <a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M111','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S19/S6/mathml/M111">View MathML</a>. The BUILD algorithm and its relatives in general produce minor-minimal trees, but these are not guaranteed to have the minimal number of interior nodes. Finding a species tree with a minimal number of interior nodes is again a hard problem [37]. At least, the vertex minimal trees are among the possibly exponentially many minor minimal trees enumerated by Semple's algorithms [36].

For a given species tree S, it is rather easy to find a reconciliation map μ from (T,t, σ) to S. A simple solution μ is closely related to the so-called LCA reconcilation: every node x of T is mapped to the last common ancestor of the species below it, lcaS(σ(L(x))) or to the edge immediately above it, depending on whether x is speciation or a duplication node. While this solution is unique for the speciation nodes, alternative mappings are possible for the duplication nodes. The set of possible reconciliation maps can still be very large despite the specified event labels. If the event labeling t is unknown, there is a reconciliation from any gene tree T to any species tree S, realized in particular by the LCA reconciliation, see e.g. [26,38]. The reconciliation then defines the event types. Typically, a parsimony rule is then employed to choose a reconciliation map in which the number of duplications and losses is minimized, see e.g. [1,4,5,9]. In our setting, on the other hand, the event types are prescribed. This restricts the possible reconciliation maps so that the gene tree cannot be reconciled with an arbitrary species tree any more. Since the observable events on the gene tree are fixed, the possible reconciliations cannot differ in the number of duplications. Still, one may be interested in reconciliation maps that minimize the number of loss events. An alternative is to maximize the number of duplication events that map to the same edge in S to account for whole genome and chromosomal duplication events [9].

Conclusions

Our approach to the reconciliation problem via event-labeled gene trees opens up some interesting new avenues to understanding orthology. In particular, the results in this contribution combined with those in [22] concerning cographs should ultimately lead to a method for automatically generating orthology relations that takes into account species relationships without having to explicitly compute gene trees. This is potentially very useful since gene tree estimation is one of the weak points of most current approaches to orthology analysis.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed to the development of the theory. MHR and NW produced the simulated data. All authors contributed to writing, reading, and approving the final manuscript.

Acknowledgements

This work was supported in part by the the Volkswagen Stiftung (proj. no. I/82719) and the Deutsche Forschungsgemeinschaft (SPP-1174 "Deep Metazoan Phylogeny", proj. nos. STA 850/2 and STA 850/3).

This article has been published as part of BMC Bioinformatics Volume 13 Supplement 19, 2012: Proceedings of the Tenth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S19

References

  1. Guigó R, Muchnik I, Smith TF: Reconstruction of ancient molecular phylogeny.

    Mol Phylogenet Evol 1996, 6:189-213. PubMed Abstract | Publisher Full Text OpenURL

  2. Page RD, Charleston MA: From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem.

    Mol Phylogenet Evol 1997, 7:231-240. PubMed Abstract | Publisher Full Text OpenURL

  3. Arvestad L, Berglund AC, Lagergren J, Sennblad B: Bayesian gene/species tree reconciliation and orthology analysis using MCMC.

    Bioinformatics 2003, 19:i7-i15. PubMed Abstract | Publisher Full Text OpenURL

  4. Bonizzoni P, Della Vedova G, Dondi R: Reconciling a gene tree to a species tree under the duplication cost model.

    Theor Comp Sci 2005, 347:36-53. Publisher Full Text OpenURL

  5. Górecki P, J T: DSL-trees: A model of evolutionary scenarios.

    Theor Comp Sci 2006, 359:378-399. Publisher Full Text OpenURL

  6. Hahn MW: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution.

    Genome Biol 2007, 8:R141. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  7. Bansal MS, Eulenstein O: The multiple gene duplication problem revisited.

    Bioinformatics 2008, 24:i132-i138. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  8. Chauve C, Doyon JP, El-Mabrouk N: Gene family evolution by duplication, speciation, and loss.

    J Comput Biol 2008, 15:1043-1062. PubMed Abstract | Publisher Full Text OpenURL

  9. Burleigh JG, Bansal MS, Wehe A, Eulenstein O: Locating large-scale gene duplication events through reconciled trees: implications for identifying ancient polyploidy events in plants.

    J Comput Biol 2009, 16:1071-1083. PubMed Abstract | Publisher Full Text OpenURL

  10. Larget BR, Kotha SK, Dewey CN, Ane C: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis.

    Bioinformatics 2010, 26:2910-2911. PubMed Abstract | Publisher Full Text OpenURL

  11. Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families.

    Nucleic Acids Res 2006, 34:D572-D580. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Goodstadt L, Ponting CP: Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human.

    PLoS Comput Biol 2006, 2:e133. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Datta RS, Meacham C, Samad B, Neyer C, Sjölander K: Berkeley PHOG: PhyloFacts orthology group prediction web server.

    Nucl Acids Res 2009, 37:W84-W89. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E: Ensembl 2007.

    Nucleic Acids Res 2007, 35:D610-617. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Pryszcz LP, Huerta-Cepas J, Gabaldón T: MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score.

    Nucleic Acids Res 2011, 39:e32. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Altenhoff AM, Dessimoz C: Phylogenetic and functional assessment of orthologs inference projects and methods.

    PLoS Comput Biol 2009, 5:e1000262. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Li L, Stoeckert CJ Jr, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes.

    Genome Res 2003, 13:2178-2189. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution.

    Nucleic Acids Res 2000, 28:33-36. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information.

    Nucleic Acids Res 2008, 36:D13-D21. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Berglund AC, Sjölund E, Ostlund G, Sonnhammer EL: InParanoid 6: eukaryotic ortholog clusters with inparalogs.

    Nucleic Acids Res 2008, 36:D263-D266. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ: Proteinortho: Detection of (Co-)Orthologs in Large-Scale Analysis.

    BMC Bioinformatics 2011, 12:124. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  22. Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N: Orthology Relations, Symbolic Ultrametrics, and Cographs.

    J Math Biol 2012. PubMed Abstract | Publisher Full Text OpenURL

  23. Fitch WM: Homology: a personal view on some of the problems.

    Trends Genet 2000, 16:227-231. PubMed Abstract | Publisher Full Text OpenURL

  24. Böcker S, Dress AWM: Recovering symbolically dated, rooted trees from symbolic ultrametrics.

    Adv Math 1998, 138:105-125. Publisher Full Text OpenURL

  25. Brandstädt A, Le VB, Spinrad JP: Graph Classes: A Survey. SIAM Monographs on Discrete Mathematics and Applications, Philadephia: Soc. Ind. Appl. Math; 1999. OpenURL

  26. Chauve C, El-Mabrouk N: New Perspectives on Gene Family Evolution: Losses in Reconciliation and a Link with Supertrees.

    LNCS 2009, 5541:46-58. OpenURL

  27. Semple C, Steel M: Phylogenetics, Volume 24 of Oxford Lecture Series in Mathematics and its Applications. Oxford, UK: Oxford University Press; 2003. OpenURL

  28. Dress AWM, Huber KT, Koolen J, Moulton V, Spillner A: Basic Phylogenetic Combinatorics. Cambridge: Cambridge University Press; 2011. OpenURL

  29. Bininda-Emonds O: Phylogenetic Supertrees. Dordrecht, NL: Kluwer Academic Press; 2004. OpenURL

  30. Aho AV, Sagiv Y, Szymanski TG, Ullman JD: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions.

    SIAM J Comput 1981, 10:405-421. Publisher Full Text OpenURL

  31. Rauch Henzinger M, King V, Warnow T: Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology.

    Algorithmica 1999, 24:1-13. Publisher Full Text OpenURL

  32. Jansson J, Ng JHK, Sadakane K, Sung WK: Rooted maximum agreement supertrees.

    Algorithmica 2005, 43:293-307. Publisher Full Text OpenURL

  33. Byrka J, Gawrychowski P, Huber KT, Kelk S: Worst-case optimal approximation algorithms for maximizing triplet consistency within phylogenetic networks.

    J Discr Alg 2010, 8:65-75. Publisher Full Text OpenURL

  34. van Iersel L, Kelk S, Mnich M: Uniqueness, intractability and exact algorithms: reflections on leve l-k phylogenetic networks.

    J Bioinf Comp Biol 2009, 7:597-623. PubMed Abstract | Publisher Full Text OpenURL

  35. Byrka J, Guillemot S, Jansson J: New results on optimizing rooted triplets consistency.

    Discr Appl Math 2010, 158:1136-1147. Publisher Full Text OpenURL

  36. Semple C: Reconstructing minimal rooted trees.

    Discr Appl Math 2003, 127:489-503. Publisher Full Text OpenURL

  37. Jansson J, Lemence RS, Lingas A: The Complexity of Inferring a Minimally Resolved Phylogenetic Supertree.

    SIAM J Comput 2012, 41:272-291. Publisher Full Text OpenURL

  38. Doyon JP, Chauve C, Hamel S: Space of Gene/Species Trees Reconciliations and Parsimonious Models.

    J Comp Biol 2009, 16:1399-1418. PubMed Abstract | Publisher Full Text OpenURL

  39. Zhang L: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies.

    J Comput Biol 1997, 4:177-187. PubMed Abstract | Publisher Full Text OpenURL

  40. Keller-Schmidt S, Tuğrul M, Eguíluz VM, Hernández-Garcíi E, Klemm K: An Age Dependent Branching Model for Macroevolution.

    Tech Rep 2010.

    1012.3298v1, arXiv

    OpenURL

  41. Hernandez-Rosales M, Wieseke N, Hellmuth M, Stadler PF: Simulation of Gene Family Histories. [http://www.bioinf.uni-leipzig.de/Publications/PREPRINTS/12-017.pdf] webcite

    Tech Rep Univ. Leipzig; 2011, 12-017. OpenURL

  42. Aho AV, Sagiv Y, Szymanski TG, Ullman JD: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions.

    SIAM J Comput 1981, 10:405-421. Publisher Full Text OpenURL

  43. Jansson J: On the Complexity of Inferring Rooted Evolutionary Trees.

    Electronic Notes Discr Math 2001, 7:50-53. OpenURL

  44. Wu BY: Constructing the Maximum Consensus Tree from Rooted Triples.

    J Comb Optimization 2004, 8:29-39. OpenURL

  45. He YJ, Huynh TN, Jansson J, Sung WK: Inferring phylogenetic relationships avoiding forbidden rooted triplets.

    J Bioinform Comput Biol 2006, 4:59-74. PubMed Abstract | Publisher Full Text OpenURL

  46. Ng MP, Wormald NC: Reconstruction of rooted trees from subtrees.

    Discr Appl Math 1996, 69:19-31. Publisher Full Text OpenURL

  47. Constantinescu M, Sankoff D: An efficient algorithm for supertrees.

    J Classification 1995, 12:101-112. Publisher Full Text OpenURL

  48. Bryant D, Steel M: Extension Operations on Sets of Leaf-Labeled Trees.

    Adv Appl Math 1995, 16:425-453. Publisher Full Text OpenURL