BMC Evolutionary Biology

official impact factor 3.70

Open Access Highly Access Research article

Can comprehensive background knowledge be incorporated into substitution models to improve phylogenetic analyses? A case study on major arthropod relationships

Björn M von Reumont1*, Karen Meusemann1, Nikolaus U Szucsich2, Emiliano Dell'Ampio2, Vivek Gowri-Shankar, Daniela Bartel2, Sabrina Simon3, Harald O Letsch1, Roman R Stocsits1, Yun-xia Luan4, Johann W Wägele1, Günther Pass2, Heike Hadrys3,5 and Bernhard Misof6

Author Affiliations

1 Molecular Lab, Zoologisches Forschungsmuseum A. Koenig, Bonn, Germany

2 Department of Evolutionary Biology, University Vienna, Vienna, Austria

3 ITZ, Ecology & Evolution, Stiftung Tieraerztliche Hochschule Hannover, Hannover, Germany

4 Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, PR China

5 Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA

6 UHH Biozentrum Grindel und Zoologisches Museum, University of Hamburg, Hamburg, Germany

For all author emails, please log on.

BMC Evolutionary Biology 2009, 9:119 doi:10.1186/1471-2148-9-119

Published: 27 May 2009

Additional files

Additional file 1:

Taxa list. Taxa list of sampled sequences. * indicates concatenated 18S and 28S rRNA sequences from different species. For combinations of genes to construct concatenated sequences of chimeran taxa, see Table S1. ** contributed sequences in the present study (author of sequences).

Format: XLS Size: 123KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

LogDet corrected network of concatenated 18S and 28S rRNA alignment. LogDet corrected network plus invariant site models (30.79% invariant sites) using SplitsTree4 based on the concatenated 18S and 28S rRNA alignment after exclusion of randomly similar sections evaluated with ALISCORE.

Format: PDF Size: 53KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Bayesian support values for selected clades. List of Baysian support values (posterior probability, pP) for selected clades of the time-heterogeneous and time-homogeneous tree.

Format: XLS Size: 79KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Detailed flow of the analysis procedure in the software package PHASE-2.0. Options used in PHASE-2.0 are italicized above the arrows and are followed by input files. Black arrows represent general flows of the analysis procedure, green arrows show that results or parameter values after single steps were inserted or accessed in a further process. Red block-arrows mark the final run of the time-heterogeneous and time-homogeneous approach with 16 chains each (2 × 118,000,000 generations). First row: I.) We prepared 3 control files (control.mcmc) for mcmcphase using three different mixed models. This "pre-run" was used for a first model selection (500,000 generations for each setting). We excluded model (C) based on non-convergence of parameter values. II.) We repeated step one (I.) with 3,000,000 generations using similar control files (different number of generations and random seeds) of the two remaining model settings. Calculated ln likelihoods values of both chains were compared in a BFT resulting in the exclusion of mixed model (A). Parameter values of the remaining model (B) were implemented in the time-heterogeneous setting. III.) We started the final analysis (final run) using sixteen chains for both the time-homogeneous and the time-heterogeneous approach. In the final time-homogeneous approach, the control files were similar to step II.) except for a different number of generations and random seeds. Second row: Additional steps were necessary prior to the computation of the final time-heterogeneous chains. We applied mcmcsummarize for the selected mixed model (B) to calculate a consensus tree. Optimizer was executed to conduct a ML estimation for each parameter value (opt.mod) based on the inferred consensus tree and optimized parameter-values (mcmc-best.mod), a data file delivered by mcmcphase. Estimated values were implemented in an initial.mod file. The initial.mod file and its parameter values were accessed by the control files of the final time-heterogeneous chains (only topology and base frequencies estimated). Third row: Trees were reconstructed separately for the time-homogeneous and time-heterogeneous setting. All chains of each approach were tested in a BFT against the chain with the best lnL. We only included chains with a 2lnB10-value > 10. From these chains we constructed a metachain for each setting using Perl and applied mcmcsummarize to infer the consensus topology. To estimate branch lengths properly we ran mcmcphase, resulting branch lengths were implemented in the consensus trees. Finally, both trees were optimized using graphic programs (Dendroscope, Adobe Illustrator CS II).

Format: PDF Size: 447KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

List of chimeran species for concatenated 18S and 28S rRNA sequences

Format: XLS Size: 117KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Primer list 18S rRNA

Format: XLS Size: 104KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Primer list 28S rRNA

Format: XLS Size: 110KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Primercard of the 18S rRNA gene for hexapods, myriapods and crustaceans. Primers used for hexapods or myriapods are shown in the upper part, primers for crustaceans in the lower part. Positions of forward primers are marked with green arrows, those of reverse primers with red arrows. When different primers with identical position were used, all primer labels are given at the single arrow for the specific position. Primers and their combinations are given in Additional file 6 and 11.

Format: PDF Size: 529KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Primercard of the 28S rRNA gene for crustaceans, hexapods and myriapods. Positions of forward primers are tagged with green arrows, those of reverse primers with red arrows. When different primers with identical position were used, all primer labels are given at the single arrow for the specific position. Primers and their combinations are given in Additional file 7 and 11.

Format: PDF Size: 577KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Primercard of the 28S rRNA gene for pterygots. Positions of forward primers are assigned by green arrows, those of reverse primers with red arrows. When different primers with identical position were used, all primer labels are given at the single arrow for the specific position. Primers and their combinations are given in Additional file 7 and 11.

Format: PDF Size: 565KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Supplementary Information. Supplementary information for lab work (amplificaion, purification and sequencing of PCR products).

Format: PDF Size: 83KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

PCR temperature-profiles

Format: XLS Size: 75KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

PCR chemicals

Format: XLS Size: 86KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 14:

Setting of exchangeability parameters used in pre-runs. Listed settings of exchangeability parameters used in pre-runs in PHASE-2.0.

Format: XLS Size: 48KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 15:

Included chains to infer the time-heterogeneous consensus tree. Number of chains, generations per chain, harmonic means (lnL) and 2lnB10-values included to infer the time-heterogeneous consensus tree.

Format: XLS Size: 55KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 16:

Included chains to infer the time-homogeneous consensus tree. Number of chains, generations per chain, harmonic means (lnL) and 2lnB10-values included to infer the time-homogeneous consensus tree.

Format: XLS Size: 55KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 17:

Localities of sampled taxa

Format: XLS Size: 127KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data