PhylDiag: identifying complex synteny blocks that include tandem duplications using phylogenetic gene trees
1 Ecole Normale Supérieure, Institut de Biologie de l’ENS, IBENS, 46 rue d’Ulm, 75005 Paris, France
2 CNRS, UMR 8197, 75005 Paris, France
3 Inserm, U1024, 75005 Paris, France
4 European Molecular Biology Laboratory, European Bioinformatics Institute Wellcome Trust Genome Campus, CB10 1SD Hinxton, Cambridge, UK
BMC Bioinformatics 2014, 15:268 doi:10.1186/1471-2105-15-268Published: 8 August 2014
Extant genomes share regions where genes have the same order and orientation, which are thought to arise from the conservation of an ancestral order of genes during evolution. Such regions of so-called conserved synteny, or synteny blocks, must be precisely identified and quantified, as a prerequisite to better understand the evolutionary history of genomes.
Here we describe PhylDiag, a software that identifies statistically significant synteny blocks in pairwise comparisons of eukaryote genomes. Compared to previous methods, PhylDiag uses gene trees to define gene homologies, thus allowing gene deletions to be considered as events that may break the synteny. PhylDiag also accounts for gene orientations, blocks of tandem duplicates and lineage specific de novo gene births. Starting from two genomes and the corresponding gene trees, PhylDiag returns synteny blocks with gaps less than or equal to the maximum gap parameter gapmax. This parameter is theoretically estimated, and together with a utility to graphically display results, contributes to making PhylDiag a user friendly method. In addition, putative synteny blocks are subject to a statistical validation to verify that they are unlikely to be due to a random combination of genes.
We benchmark several known metrics to measure 2D-distances in a matrix of homologies and we compare PhylDiag to i-ADHoRe 3.0 on real and simulated data. We show that PhylDiag correctly identifies small synteny blocks even with insertions, deletions, incorrect annotations or micro-inversions. Finally, PhylDiag allowed us to identify the most relevant distance metric for 2D-distance calculation between homologies.