Max-Planck-Institute for Mathematics in the Sciences, Leipzig, D-04103, Germany

Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, D-04107, Germany

Center for Bioinformatics, Saarland University, Saarbrücken, D-66041, Germany

High Throughput Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller Universität Jena, Jena, D-07743, Germany

Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Leipzig, D04103, Germany

School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK

Inst. f. Theoretical Chemistry, University of Vienna, Vienna, A-1090, Austria

Santa Fe Institute, Santa Fe, NM, 87501, USA

Abstract

Background

Tree reconciliation problems have long been studied in phylogenetics. A particular variant of the reconciliation problem for a gene tree

Results

We show that

Conclusions

The knowledge of event labels in a gene tree strongly constrains the possible species tree and, for a given species tree, also the possible reconciliation maps. Nevertheless, many degrees of freedom remain in the space of feasible solutions. In order to disambiguate the alternative solutions additional external constraints as well as optimization criteria could be employed.

Background

The reconstruction of the evolutionary history of a gene family is necessarily based on at least three interrelated types of information. The true phylogeny of the investigated species is required as a scaffold with which the associated gene tree must be reconcilable. Orthology or paralogy of genes found in different species determines whether an internal vertex in the gene tree corresponds to a duplication or a speciation event. Speciation events, in turn, are reflected in the species tree.

The reconciliation of gene and species trees is a widely studied problem

Although orthology information is often derived from the reconciliation of a gene tree with a species tree (cf. e.g. TreeFam

According to Fitch's definition

This observation suggests that a viable approach to reconstructing histories of large gene families may start from an empirically determined orthology relation, which can be directly adjusted to conform to the requirement of being a cograph. The result is then equivalent to an (usually incompletely resolved) event-labeled gene tree, which might be refined or used as constraint in the inference of a fully resolved gene tree. In this contribution we are concerned with the next conceptual step: the derivation of a species tree from an event-labeled gene tree. As we shall see below, this problem is much simpler than the full tree reconciliation problem. Technically, we will approach this problem by reducing the reconciliation map from gene tree to species tree to rooted triples of genes residing in three distinct species. This is related to an approach that was developed in

Methods

Definitions and notation

Phylogenetic trees

A ^{0 }= _{T }_{T }_{T }_{T }_{T}_{T }_{T }**{****}**, we put lca_{T }_{T }**{****}**) and if **{****}**, we put lca_{T }_{T }**{****}**). For later reference, we have, for all _{T }_{T }_{T }** ^{' }**is the phylogenetic tree with leaf set

It will be convenient for our discussion below to extend the ancestor relation

Rooted triples

Rooted triples are phylogenetic trees on three leaves with precisely two interior vertices. Sometimes also called rooted triplets **{****}**. Then we denote by ((_{r }

Clearly, a set

The problem of determining a maximum consistent subset

The BUILD algorithm, furthermore, does not necessarily generate for a given triple set **' **is obtained from **' **does not display

Event labeling, species labeling, and reconciliation map

A gene tree **|****| ≥ **3 and **|****| ≥ **1. We consider only gene duplications and gene losses, which take place between speciation events, i.e., along the edges of

The true evolutionary history of a single ancestral gene thus can be thought of as a scenario such as the one depicted in Figure

Gene trees

**Gene trees**. **Left: **Example of an evolutionary scenario showing the evolution of a gene family. The corresponding true gene tree **Right: **The corresponding gene tree

In order to allow _{B }

The true gene tree

Furthermore, we can observe a map σ: **{**σ(**} **for any subtree **{**σ(**}**.

The observable part of the species tree _{S }_{s}B

The evolutionary scenario also implies an **·**) or a duplication event (□). It is convenient to use the special label ⊙ for the leaves **{****} **∈

(C) Let **·**, and let **∩ **σ (

Note the we do not require the converse, i.e., from the disjointness of the species sets σ (**not **conclude that their last common ancestor is a speciation vertex.

For _{T }_{T }**• **then σ(_{T }

Let us now consider the properties of the restriction of _{T }

**Definition 1**. **→ ****→ **

_{S}

We note that ^{-1}(_{S}_{S}_{S}

We illustrate this definition by means of an example in Figure

Mapping

**Mapping μ**. Example of the mapping

**Lemma 2**.

_{S}

_{S}

For

Results and discussion

Results

Unless stated otherwise, we continue with our assumptions on

**Lemma 3**.

_{T }_{T }_{T }_{T }_{S}

Equation (1) is well known to hold for gene tree/species reconciliation in the absence of a prescribed event labeling in

Since a phylogenetic tree (in the original sense)

As we shall see below,

**Lemma 4**.

_{T }_{T }_{S}_{T }_{S}_{T }_{T }_{T }

Now suppose that _{T }_{T }_{T }

It is important to note that a similar argument cannot be made for triples in

Triples with duplication event at the root

**Triples with duplication event at the root**. Triples from

**Definition 5**.

As an immediate consequence of Lemma 4,

**Theorem 6**.

We explicitly construct the map

(M2) _{S}

Note that alternative (M1) ensures that _{S}**\ **

**Claim: **If

Since **\ **

Now suppose **\ ****\ **

Next, we extend the map _{S}

(M3) _{S}

which now makes

By construction, Conditions

It follows that

**Corollary 7**. **||**

**||**_{S}

We remark that given a species tree

Lemma 4 implies that consistency of the triple set

**Theorem 8**. **σ**)

We remark that a related result is proven in [26, Theorem.5] for the full tree reconciliation problem starting from a forest of gene trees.

It may be surprising that there are no strong restrictions on the set

**Theorem 9**. **→ **

_{k}_{k}_{l}_{l }_{T }_{k }_{T }_{k }_{k}_{T }**− **(**{**_{T }**}**). Finally, we define the map **→ **_{k }_{k}

We remark that the gene tree constructed in the proof of Theorem 9 can be made into a binary tree by splitting the root _{T }

Inferred species trees

**Inferred species trees**. The set

Results for simulated gene trees

In order to determine empirically how much information on the species tree we can hope to find in event labeled gene trees, we simulated species trees together with corresponding event-labeled gene trees with different duplication and loss rates. Approximately 150 species trees with 10 to 100 species were generated according to the "age model"

The results are summarized in Figure

Recovered splits in species trees

**Recovered splits in species trees**. **Left: **Heat map that represents the percentage of recovered splits in the inferred species tree from triples obtained from simulated event-labeled gene trees with different loss and duplication rates. **Right: **Scattergram that shows the average of losses and duplications in the generated data and the accuracy of the inferred species tree.

Discussion

Event-labeled gene trees can be obtained by combining the reconstruction of gene phylogenies with methods for orthology detection. Orthology alone already encapsulates partial information on the gene tree. More precisely, the orthology relation is equivalent to a homomorphic image of the gene tree in which adjacent vertices denote different types of events. We discussed here the properties of reconciliation maps

It can be expected that for real-life data the tree

For a given species tree _{S}

Conclusions

Our approach to the reconciliation problem via event-labeled gene trees opens up some interesting new avenues to understanding orthology. In particular, the results in this contribution combined with those in

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed to the development of the theory. MHR and NW produced the simulated data. All authors contributed to writing, reading, and approving the final manuscript.

Acknowledgements

This work was supported in part by the the

This article has been published as part of