Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees

Joseph MEX Lucas123, Matthieu Muffato4 and Hugues Roest Crollius123*

  • * Corresponding author: Hugues R Crollius hrc@ens.fr

Author Affiliations

1 Ecole Normale Supérieure, Institut de Biologie de l’ENS, IBENS, 46 rue d’Ulm, 75005 Paris, France

2 CNRS, UMR 8197, 75005 Paris, France

3 Inserm, U1024, 75005 Paris, France

4 The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Düsternbrooker Weg 20, CB10 1SD Hinxton, Cambridge, UK

For all author emails, please log on.

BMC Bioinformatics 2014, 15:268  doi:10.1186/1471-2105-15-268

Published: 8 August 2014

Abstract

Background

Extant genomes share regions where genes have the same order and orientation, which are thought to arise from the conservation of an ancestral order of genes during evolution. Such regions of so-called conserved synteny, or synteny blocks, must be precisely identified and quantified, as a prerequisite to better understand the evolutionary history of genomes.

Results

Here we describe PhylDiag, a software that identifies statistically significant synteny blocks in pairwise comparisons of eukaryote genomes. Compared to previous methods, PhylDiag uses gene trees to define gene homologies, thus allowing gene deletions to be considered as events that may break the synteny. PhylDiag also accounts for gene orientations, blocks of tandem duplicates and lineage specific de novo gene births. Starting from two genomes and the corresponding gene trees, PhylDiag returns synteny blocks with gaps less than or equal to the maximum gap parameter gapmax. This parameter is theoretically estimated, and together with a utility to graphically display results, contributes to making PhylDiag a user friendly method. In addition, putative synteny blocks are subject to a statistical validation to verify that they are unlikely to be due to a random combination of genes.

Conclusions

We benchmark several known metrics to measure 2D-distances in a matrix of homologies and we compare PhylDiag to i-ADHoRe 3.0 on real and simulated data. We show that PhylDiag correctly identifies small synteny blocks even with insertions, deletions, incorrect annotations or micro-inversions. Finally, PhylDiag allowed us to identify the most relevant distance metric for 2D-distance calculation between homologies.

Keywords:
Comparative genomics; Synteny; Synteny block; Segmental homologies; Homology; Gene order; Rearrangement; Ancestral genome; Gene tree