Email updates

Keep up to date with the latest news and content from BMC Biology and BioMed Central.

Journal App

google play app store
Open Access Highly Accessed Research article

Phylogenomics supports microsporidia as the earliest diverging clade of sequenced fungi

Salvador Capella-Gutiérrez, Marina Marcet-Houben and Toni Gabaldón*

Author Affiliations

Bioinformatics and Genomics Programme. Centre for Genomic Regulation (CRG) and UPF. Doctor Aiguader, 88. 08003 Barcelona, Spain

For all author emails, please log on.

BMC Biology 2012, 10:47  doi:10.1186/1741-7007-10-47

Published: 31 May 2012

Additional files

Additional file 1:

Supplementary tables and figures cited in the main text. Legends can be found below.

Additional file 1, Figure S1.

Showcase example to illustrate the confounding effects of recent segmental duplications in the detection of conserved syntenic pairs. The figure shows four syntenic pairs detected between the microsporidian Encephalitozoon cuniculi (code names in green) and the zygomycetes Rhizopus oryzae (code names in orange) using the "relaxed synteny" approach described in [14].

Relative locations in the genome are shown next to the relevant phylogenetic trees present in the reconstructed E. cuniculi phylome. Note that one of the genes was not included in the phylogenetic reconstruction because it did not pass the thresholds used. From the topology of the tree it is clear that the R. oryzae genes are paralogous to each other and that they result from a lineage-specific duplication that conserved the neighborhood of the genes. This leads to an over-estimation of the number of conserved syntenic pairs.

Additional file 1, Figure S2

Analysis of the microsporidian sister groups for the phylome trees for all microsporidian phylomes where at least one member of each predefined group is present, and where out-group species are monophyletic. Groups of bars represent the percentage of trees that detect a given fungal group as sister to microsporidians. Differently colored bars represent the percentage of trees after applying filters focused on discarding trees that are more likely to present phylogenetic noise. From darker to lighter the bars represent: all the trees, trees where the branch-support of the node defining the association of microsporidians and their sister group is higher than 0.8, trees where the alignment has an average consistency score over 0.75, alignments with a length over 500 amino acids and the trees that pass all the filters.

Additional file 1, Figure S3

Same as Additional file 1 Figure S3 but using only A. locusteae phylome.

Additional file 1, Figure S4

Same as Additional file 1 Figure S3 but using only E. bieneusi phylome.

Additional file 1, Figure S5

Same as Additional file 1 Figure S3 but using only E. cuniculi phylome.

Additional file 1, Figure S6

Same as Additional file 1 Figure S3 but using only E. intestinalis phylome.

Additional file 1, Figure S7

Same as Additional file 1 Figure S3 but using only N. ceranae phylome.

Additional file 1, Figure S8

Same as Additional file 1 Figure S3 but using only N. parisii phylome.

Additional file 1, Figure S9

Super-tree constructed using duptree [31]. The 3,768 trees reconstructed in the microsporidian phylomes, where at least one member of each predefined group is present, were used.

Additional file 1, Figure S10

Species tree obtained from the concatenated alignment of 53 widespread, single-copy proteins (Additional file 1 Table S6). The alignment was then trimmed to remove non-informative columns and columns that contained gaps for the six microsporidian species considered. The maximum likelihood tree was reconstructed using the CAT40 evolutionary model and using the SPR tree topology search method as recommended in PhyML [53] manual. A discrete gamma-distribution with four rate categories plus invariant positions was used, estimating the gamma parameter and the fraction of invariant positions from the data. Branch supports are SH-based aLRT statistics. Nodes with support below 1 are marked on the tree.

Additional file 1, Figure S11

Species tree obtained from the concatenated alignment of 53 widespread, single-copy proteins (Additional file 1 Table S6). The alignment was then trimmed to remove non-informative columns and columns that contained gaps for the six microsporidian species considered. A Bayesian analysis was performed using PhyloBAYES v3.2 [57] using CAT as the evolutionary model. The analysis was performed using 2 independent MCMC with a saving frequency of 100 generations and the following stop criteria: 1) a maximum discrepancy across the bi-partitions (maxdiff) less than 0.1 and 2) a minimum effective size of, at least, 100 points for each parameter in the program. Finally, a consensus tree was generated using the majority-consensus rule. Nodes with posterior probability below 1 are marked on the tree.

Additional file 1, Figure S12

Species tree obtained from the concatenated alignment of 53 widespread, single-copy proteins (Additional file 1 Table S6). The alignment was then trimmed to remove non-informative columns and columns that contained gaps for the six microsporidian species considered. A ML tree, accounting for potential heterotachy, was derived with a free rates parameter covarion model recently implemented in PhyML (provided by S. Guindon). Nodes with support below 1 are marked on the tree.

Additional file 1, Figure S13

Species tree obtained from the concatenated alignment of 53 widespread, single-copy proteins (Additional file 1 Table S6). The alignment was then trimmed to remove non-informative columns and columns that contained gaps for the six microsporidian species considered. Resulting alignment was recoded to a reduced four-letters alphabet [37]. A maximum likelihood tree was derived under a general time reversible (GTR) model as implemented in PhyML [53]. A discrete gamma-distribution with four rate categories plus invariant positions was used, estimating the gamma parameter and the fraction of invariant positions from the data. Branch supports are SH-based aLRT statistics. Nodes with support below 1 are marked on the tree.

Additional file 1, Figure S14

Alternative species tree topologies used for statistical comparisons to the ML topology (Figure 3 in the main paper). All alternative topologies were generated with ETE [55]. Microsporidia species were collapsed in the initial topologies to avoid favoring any internal organization. ML reconstruction on the complete alignment was performed in two steps, the first determines internal organization of microsporidia using RaxML [59] while the second optimizes tree branch lengths to compute the likelihood for alternative scenarios. Different alternative topologies considered for microsporidia position were: A) basal to all fungi, C) grouped with Chytridiomycotina, Z) grouped with Zygomycotina, B) grouped with Basidiomycotina, S) grouped with Saccharomycotina, P) grouped with Pezizomycotina, T) grouped with Taphrinomycotina, S+P) placed at the common ancestor of Saccharomycotina and Pezizomycotina, T+S+P) placed at the base of ascomycotina, B+T+S+P) placed at the base of dykarya, Z+B+T+S+P) placed at the common ancestor of dykarya and Zygomycotina, A - C+Z) basal to all fungi but grouping Zygomycotina and Chytridiomycotina.

Additional file 1, Figure S15

Summary of the results of eight statistical tests comparing twelve alternative species tree topologies (see Additional file 1 Figure S14). These tests were performed on on four partitions of the 53 proteins concatenated alignment (Additional file 1 Table S64). The alignment was trimmed to remove non-informative columns and columns that contained gaps for the six microsporidian species considered. Then, partitions were generated by sequentially removing the 2, 4, 6 and 8 fastest-evolving sites categories, as classified by TreePuzzle v5.2 [58]. Finally the alternative topologies were tested on each separate partition. Dark gray indicates the topology with the best likelihood, while light gray indicate the topologies whose likelihood is not significantly different to the best one, according to a given test. White squares represent those tree topologies that can be confidently rejected according to a given test.

Additional file 1, Figure S16

Species tree obtained from the concatenated alignment of 42 widespread, single-copy proteins including all species used on this study (Additional file 1 Tables 4 and 5). Original 53 genes (Additional file 1 Table S6) were used to search for single-copy orthologs in all species and those present in few species were removed from the concatenated alignment. The resulting alignment was then trimmed to remove non-informative columns and columns that contained gaps for the nine microsporidian species considered. ML tree was reconstructed using LG evolutionary model, since it was the best fitting evolutionary model in 39 out of 42 final selected genes, and using SPR as tree topology search method such as recommended in PhyML [53] manual. A discrete gamma-distribution with four rate categories plus invariant positions was used, estimating the gamma parameter and the fraction of invariant positions from the data. Branch supports are SH-based aLRT statistics. Nodes with support below 1 are marked on the tree.

Additional file 1, Figure S17

Topology of the tree used as input in the simulations with ROSE [60] (A) and the final tree inferred applying the standard procedure of concatenation and maximum likelihood reconstruction (B).

Additional file 1, Table S1

The number of syntenic pairs as detected with the "relaxed synteny" method (see main text). Columns represent each of the microsporidian genomes. Rows represent the species against which a given microsporidian genome is compared. For each pair of genomes the following information is available: the first column for each microsporidian species represents the total number of shared homologs in these two species, the second column represents the number of syntenic pairs found, without any correction, the third column represents the normalized number of pairs per 1000 shared homologs. The fourth and fifth columns represent the same data as the second and third column but correcting the values by counting paralogous pairs only once (see main text).

Additional file 1, Table S2

Table representing the number of pairs of proteins with conserved synteny in two genomes detected with the strict method (see main text). Numbers are normalized in this case by the number of shared orthologs (shared pairs per 1000 shared orthologs). The table follows the same structure as Additional file 1 Table S1.

Additional file 1, Table S3

Summary of the analysis run in this paper and the main conclusions extracted for each analysis. Table is read from left to right, with each subsequent analysis acting on the one located to its left. The last column indicates the main conclusion obtained for a given analysis.

Additional file 1, Table S4

List of species included in the analysis. Columns indicate, in this order, the fungal group, the three letter code used throughout the analysis, the species name, as found in the download site, the source where the proteomes were downloaded, and date when the data were acquired. Species belonging to the primary set are shadowed, all the rest belong to the secondary set.

Additional file 1, Table S5

List of new species included in the concatenation analyses. Columns indicate, in this order, the taxonomic group, the species code used throughout the analysis, the species name, as found in the download site, the source where proteomes were downloaded, and date when the data were acquired.

Additional file 1, Table S6

List of 53 widespread, single-copy proteins used in the concatenation. The E. cuniculi orthologs are listed. Columns represent the UNIPROT accession code, the gene name, the length of the protein and the description of the gene.

Format: PDF Size: 15.6MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data