Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland

Institute of Systematic Botany, University of Zurich, Zurich, Switzerland

Abstract

Background

Cyanobacteria are one of the oldest and morphologically most diverse prokaryotic phyla on our planet. The early development of an oxygen-containing atmosphere approximately 2.45 - 2.22 billion years ago is attributed to the photosynthetic activity of cyanobacteria. Furthermore, they are one of the few prokaryotic phyla where multicellularity has evolved. Understanding when and how multicellularity evolved in these ancient organisms would provide fundamental information on the early history of life and further our knowledge of complex life forms.

Results

We conducted and compared phylogenetic analyses of 16S rDNA sequences from a large sample of taxa representing the morphological and genetic diversity of cyanobacteria. We reconstructed ancestral character states on 10,000 phylogenetic trees. The results suggest that the majority of extant cyanobacteria descend from multicellular ancestors. Reversals to unicellularity occurred at least 5 times. Multicellularity was established again at least once within a single-celled clade. Comparison to the fossil record supports an early origin of multicellularity, possibly as early as the "Great Oxygenation Event" that occurred 2.45 - 2.22 billion years ago.

Conclusions

The results indicate that a multicellular morphotype evolved early in the cyanobacterial lineage and was regained at least once after a previous loss. Most of the morphological diversity exhibited in cyanobacteria today —including the majority of single-celled species— arose from ancient multicellular lineages. Multicellularity could have conferred a considerable advantage for exploring new niches and hence facilitated the diversification of new lineages.

Background

Cyanobacteria are oxygenic phototrophic prokaryotes from which chloroplasts, the light harvesting organelles in plants, evolved. Some are able to convert atmospheric nitrogen into a form usable for plants and animals. During Earth history, cyanobacteria have raised atmospheric oxygen levels starting approximately 2.45 - 2.22 billion years ago and provided the basis for the evolution of aerobic respiration

Subset of cyanobacterial taxa used for the analyses with GenBank accession numbers for 16S rDNA sequences

**unicellular strains**

**accession numbers**

**multicellular strains**

**accession numbers**

**Section I**

**Section III**

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

**Section IV**

**Section II**

^{1}

^{1}

^{1}

^{1}

**Section V**

^{1}

**Eubacteria**

^{1}species used to test substitutional saturation

Different interpretations of multicellularity are currently used

Some of the oldest body fossils unambiguously identified as cyanobacteria have been found in the Kasegalik and McLeary Formations of the Belcher Subgroup, Canada, and are evaluated to be between 1.8 billion and 2.5 billion years old

Phylogenetic analyses of cyanobacteria have gained in quantity over the past 20 years

If one studies phylogenetic relationships based on protein coding genes in bacteria, it is possible to encounter the outcome of horizontal gene transfer (HGT)

The aim of this paper is to use molecular phylogenetic methods to address the evolutionary history of cyanobacteria and the evolution of multicellularity. For this purpose, we established a phylogeny based on 16S rDNA sequences belonging to 1,254 cyanobacterial taxa. From that phylogeny we sampled 58 cyanobacterial taxa that represent all main clades obtained and all five sections described by Castenholz

Results and Discussion

Phylogenetic analysis

Phylogenetic analyses of all identified cyanobacteria

To infer the evolution of multicellularity in cyanobacteria we carried out several phylogenetic analyses. To ensure a correct taxon-sampling, a phylogeny containing 1,254 16S rDNA sequences of cyanobacteria obtained from GenBank was reconstructed (Figure

Phylogenetic tree of 1,254 cyanobacterial species

**Phylogenetic tree of 1,254 cyanobacterial species**. Maximum likelihood phylogram of cyanobacteria, based on GTR+G+I substitution model. Six eubacterial species form an outgroup. The ingroup contains 1,254 cyanobacterial strains and six different chloroplast sequences. Bootstrap values (> 50%) calculated from 100 re-samplings are displayed at the nodes. Colors define major morphological characters in the groups. Yellow are single-celled cyanobacteria of section I; orange single-celled from section II; green are multicellular, undifferentiated cyanobacteria from section III; blue are multicellular and differentiated bacteria from section IV; and pink from section V. Sections as described by Castenholz 2001

Phylogenetic analyses to identify an outgroup

Rooted and unrooted phylogenetic analyses reconstructed with maximum likelihood and Bayesian inference and based on 16S rRNA gene sequences of 27 eubacterial species, including 5 cyanobacteria revealed congruent results. Cyanobacteria form a monophyletic group. Figure

Unrooted Bayesian consensus tree of Eubacteria including five cyanobacterial species

**Unrooted Bayesian consensus tree of Eubacteria including five cyanobacterial species**. Unrooted phylogenetic tree of 16S rRNA gene sequences from 27 eubacterial species reconstructed using Bayesian methods. Posterior probabilities (black) and bootstrap values (red) from 100 re-samplings are displayed at the nodes. Cyanobacteria, represented by 5 species, form a monophyletic group with

**Rooted Bayesian consensus tree of 27 eubacterial species including five cyanobacterial species**. Bayesian analysis of 16S rRNA gene sequences from 27 Eubacteria, based on GTR+I+G substitution model with an archaean outgroup. Posterior probabilities (black) and bootstrap values (red) from 100 re-samplings are displayed at the nodes. Cyanobacteria (blue-green box) are strongly supported as a monophyletic group with

Click here for file

We separately tested each of the 22 eubacterial species originating from a diverse set of non-cyanobacterial phyla, with a subset of the cyanobacteria (58 taxa). The latter were chosen from the large dataset containing 1,254 taxa, and cover all sub-groups of the tree (Table

Results of six phylogenetic trees are displayed in Figure **E**ntire five sections(A-D)), AC and C (nomenclature as described for the large tree; Figure

Bayesian consensus trees of cyanobacterial subset using different eubacterial outgroups

**Bayesian consensus trees of cyanobacterial subset using dierent eubacterial outgroups**. Six out of 22 phylogenetic trees reconstructed with Bayesian inference. For each tree an outgroup from a different eubacterial phylum was chosen. Posterior probabilities are displayed at the nodes. Green color represents multicellular cyanobacteria from section III, green-yellow gradient covers species from unicellular section I and multicellular section III, and purple depicts all five different morphological sections present in cyanobacteria. The majority of outgroups exhibits a similar tree topology. For further analyses

**Bayesian consensus trees of cyanobacterial subset and different outgroups - newick format**. 22 Bayesian consensus trees with posterior probabilities of a cyanobacterial subset (58 taxa) and different eubacterial outgroups, displayed in newick format. Trees were run for 10,000,000 generations using a GTR+I+G substitution model with the first 3,000,000 generations being discarded as a burn-in.

Click here for file

In total 14 trees showed congruent topologies. From the 14 eubacteria which have been used as an outgroup in these trees, we chose

Phylogenetic analyses of a cyanobacterial subset

Phylogenetic analyses of 16S rRNA gene sequences from a subset of 58 cyanobacterial taxa were conducted using maximum likelihood (Additional File

**Maximum likelihood tree of cyanobacterial subset**. Maximum likelihood analysis of 16S rDNA sequences from 58 cyanobacteria, based on GTR+G+I substitution model, with

Click here for file

Phylogenetic tree of a cyanobacterial subset

**Phylogenetic tree of a cyanobacterial subset**. Bayesian consensus cladogram of 16S rDNA sequences from 58 cyanobacterial strains, based on GTR+G+I substitution model, with

**Results from the test of substitutional saturation**. Substitutional saturation of the sequences was tested using DAMBE software. The index of substitutional saturation is smaller than the estimated critical value irrespective of the symmetry of the tree. The sequences are therefore not saturated.

Click here for file

A general substitution model (GTR+G+I) was applied for both analyses. Results of the maximum likelihood and Bayesian methods are highly congruent. Result of the Bayesian analysis with posterior probabilities (black) and bootstrap values (red) displayed at the nodes is pictured in Figure

Cyanobacteria form three distinct clades mentioned earlier (Figure

The tree from Figure 2 in Honda

In Turner

Swingley

Monophyly of section V (the branching, differentiated cyanobacteria) shown in our tree agrees with Turner

Ancestral character state reconstruction

Our analysis indicates that multicellularity is a phylogenetically conservative character (p-value < 0.01). If the terminal taxa of the Bayesian consensus tree are randomly re-shuffled, a count through 1,000 re-shuffled trees gives an average of 20 transition steps. However an average of only nine parsimonious transitions was observed in a count through 10,000 randomly sampled trees of our ancestral character state reconstruction.

Results of the character state reconstruction using the AsymmMK model with transition rates estimated by Mesquite 2.71 ^{3}) searches of the Bayesian tree reconstruction.

Ancestral character state reconstruction using maximum likelihood

**Ancestral character state reconstruction using maximum likelihood**. Ancestral character state reconstruction with maximum likelihood analysis, using the "Asymmetrical Markov k-state 2 parameter"(AsymmMk) model implemented in Mesquite 2.71

Cyanobacteria share a unicellular ancestor, but multicellularity evolved early in the cyanobacterial lineage. We identified multicellular character states for three basic ancestors leading to clades E, AC and C in our tree. Together, these clades encompass the entirety of the morphological sections II, III, IV and V. Additionally character states were reconstructed using maximum likelihood analysis and fixed transition rates to analyze properties of the data set. Transition rates are presented in Table

Different Transition rates with whom ancestral character states were estimated.

**method rates**

**Maximum likelihood analysis**

**Bayesian analysis**

**AsymmMK ^{1}**

**MK1 ^{2}**

**F1 ^{3}**

**F2**

**F3**

**F4**

**F5**

**F6**

**rjhp ^{4}**

**fw**
^{5}

1.62

2.67

0.90

2.70

5.40

0.45

0.90

2.70

2.881

**bw**
^{6}

2.99

2.67

2.70

0.90

0.45

5.40

0.90

2.70

2.873

^{1}Asymmetrical Markov k-state 2 parameter model; rates estimated from the consensus tree

^{2}Markov k-state 1 parameter model; rates estimated from the consensus tree

^{3}F1-F6: Models using different fixed transition rates

^{4}reversible jump for model selection, using a hyper prior

^{5}forward rate describing changes to multicellularity

^{6}backward rate describing changes back to a unicellular state

Ancestral character states of nodes 3, 4 and 5 using different transition rates and methods.

**node 3**

**node 4**

**node 5**

**method**

**model**

**state1**

**state0**

**state1**

**state0**

**state1**

**state0**

**ML**
^{1}

AsymmMK

estimated^{3}

0.88

0.12

0.91

0.08

0.95

0.05

F1

0.96

0.04

0.98

0.02

0.99

0.01

F2

0.87

0.12

0.91

0.09

0.94

0.06

F3

1.00

0.00

1.00

0.00

1.00

0.00

F4

0.88

0.12

0.92

0.08

0.95

0.05

MK1

estimated^{3}

0.79

0.21

0.83

0.17

0.90

0.10

F5

0.88

0.12

0.90

0.10

0.93

0.07

F6

0.79

0.21

0.83

0.17

0.90

0.10

**MP**
^{2}

0.6805

0.0013

0.6799

0.0014

0.6871

0.0014

**BA**
^{3}

rjhp

0.915

0.0851

0.817

0.183

0.902

0.0980

^{1}Maximum likelihood: Average frequencies across trees were calculated

^{2}Maximum parsimony: Uniquely best states across trees were counted

^{3}Bayesian analysis: model parameters estimated based on the data

The maximum likelihood analysis is not contradicted by a Maximum Parsimony optimization (Table ^{3}) runs of the Bayesian tree reconstruction. The relative probabilities for a multicellular ancestor at nodes 3, 4 and 5 are 0.68, 0.68 and 0.69, respectively. In contrast, the relative probabilities for a unicellular ancestor at nodes 3, 4 and 5 under parsimony reconstruction are 0.0013, 0.0014 and 0.0014, respectively.

**Ancestral character state reconstruction using maximum parsimony**. Summary of results over 10,000 randomly sampled trees from the Bayesian analysis. Uniquely best states were counted and are shown on the Bayesian consensus tree. Possible states are unicellular (yellow) and multicellular (black). At the nodes, probabilities for each character state are represented with a pie chart. The white part in the pie charts indicates fraction of trees where the node was absent, grey parts describe fraction of trees where both states were equally likely. Nodes where transitions occurred were labelled with an asterisk if they show strong support from the phylogenetic analyses. The maximum parsimony analysis produced a similar result compared to the maximum likelihood analysis. A unicellular ancestry for the most recent common ancestor of all cyanobacteria is supported. Nodes 3, 4 and 5 are most frequently optimized as multicellular. Multicellularity has been estimated for nodes 3 and 4 in 6800 trees and for node 5 in 6900 trees. In contrast, single celled states for these nodes have been reported, for node 3 in 13 out of 10,000 trees and for node 4 and 5 in 14 out of 10,000 trees. Five reversals to unicellularity can be detected and at least one reversal to multicellularity.

Click here for file

Using Bayesian methods, a similar pattern is observed for these nodes. As an evolutionary model, BayesFactors revealed that a "hyperprior" approach with exponential prior distributions, whose means were sampled from a uniform distribution between 0 and 10 gave the best fit. Transition rates were estimated to be almost equal. Figure

Ancestral character states of nodes 3, 4 and 5 using Bayesian analysis

**Ancestral character states of nodes 3, 4 and 5 using Bayesian analysis**. Posterior probability distribution for a unicellular character state (yellow) and a multicellular character state (black) at nodes 3, 4 and 5 from 10,000 Bayesian trees. 2× 5,000 trees were randomly sampled from 2 ^{3}-searches. Analysis was performed using BayesTraits. Posterior distributions were derived from reversible jump MCMC-search of 30 million iterations using a hyperprior approach. The probability of a multicellular ancestry is shifted towards 1 for each of the three nodes.

At least five reversals to unicellularity occurred in the tree, three of them within clade AC. The first transition occurred on a branch which led to a group of thermophilic cyanobacteria:

It is very likely that at least one additional reversal to unicellularity occurred in clade E1, but phylogenetic support is not high enough to locate the exact position of this transition. Similarly, support for the nodes where the other transition to multicellularity within clade E occurred is missing. The exact locations of reversals within clade E therefore are not certain and a scenario where multiple reversals occurred cannot be excluded. In clade E, there is also a reversal to multicellularity observed in

Stucken

The majority of cyanobacteria living today are described as successful ecological generalists growing under diverse conditions

Gaining and losing multicellularity

In eukaryotes, simple multicellular forms build the foundation for the evolution of complex multicellular organisms. Although complex multicellularity exhibiting more than three cell types is presumably missing in prokaryotes, bacteria invented simple multicellular forms possibly more than 1.5 billion years earlier than eukaryotes

Prokaryotic fossil record before the "Great Oxygenation Event": Evidence for multicellular cyanobacteria?

Various claims for life during the early Archean Eon, more than 3.00 billion years ago exist. Most of them from two regions: the Berberton Greenstone Belt, South Africa (around 3.20-3.50 billion years old) and the Pilbara Craton, Western Australia (around 2.90-3.60 billion years old). For some of these "fossils" a biological origin is questioned

Timeline with prokaryotic fossil record

**Timeline with prokaryotic fossil record**. Timeline with geological events (A) and prokaryotic fossil record (B). (A) Formation of Earth

Some late Archean fossils show an oscillatorian or chroococcacean morphotype (Figure

The first conclusive cyanobacterial fossils from all five sections have been reported from around 2.15 billion year old rocks. In 1976, Hofmann described Microfossils from stromatolitic dolomite stones in the Kasegalik and McLeary Formations of the Belcher Supergroup in Hudson Bay, Northern Canada. Among these fossils are

Several studies have assessed prokaryotic history using phylogenetic dating methods

Conclusions

Cyanobacteria, photosynthetic prokaryotes, are one of the oldest phyla still alive on this planet. Approximately 2.20-2.45 billion years ago cyanobacteria raised the atmospheric oxygen level and established the basis for the evolution of aerobic respiration

Multicellular prokaryotic fossils from the Archean Eon are documented

In terms of cell types, cyanobacteria reached their maximum morphological complexity around 2.00 billion years ago

Figure

Schematic illustration of cyanobacterial evolution

**Schematic illustration of cyanobacterial evolution**. Numbers at the nodes indicate Bayesian posterior probabilities (black) and bootstrap values (red) from the phylogenetic analyses. The most recent common ancestor of all cyanobacteria is optimized to have been unicellular. All cyanobacteria derive from a unicellular most recent common ancestor (node 1). The lineage leading to

Methods

Taxon sampling

A total of 2,065 16S rRNA gene sequences from the phylum cyanobacteria were downloaded from GenBank. Unidentified and uncultured species were excluded. With this large dataset phylogenetic reconstructions were conducted as described in the next section. Aside from cyanobacteria, the dataset included six chloroplast sequences and six eubacterial sequences:

From this large tree a subset of 58 cyanobacterial sequences were selected for further analyses. Accession numbers are provided in Table

An outgroup for further analyses was chosen from a set of eubacterial, non-cyanobacterial species whose 16S rRNA gene sequences were downloaded from GenBank (Table

Non-cyanobacterial species used in this study with GenBank accession numbers for 16S rDNA sequences

**Phyla ^{1}**

**species**

**accession numbers**

**EUBACTERIA**

Acidobacteria

Actinobacteria

Aquificae

Bacteroidetes

Chlamydiae/Verrucomicrobia

Chlamydiae/Verrucomicrobia

Chlorobi

Chloroflexi

Chrysiogenetes

Deferribacteres

Deinococcus-Thermus

Dictyoglomi

Fibrobacteres

Firmicutes

Fusobacteria

Gemmatimonadetes

Nitrospirae

Planctomycetes

Proteobacteria

Spirochaetes

Thermodesulfobacteria

Thermotogae

**ARCHAEA**

Nanoarchaeota

^{1}taxonomy as described at

Phylogenetic analyses

Phylogenetic analyses of all identified cyanobacteria

The 2,065 16S rRNA gene sequences were aligned using the software MAFFT

**Phylogenetic tree of cyanobacteria - newick format**. Phylogenetic tree of 1,254 cyanobacterial sequences including six chloroplasts and six Eubacteria analyzed using maximum likelihood analysis with a GTR+G+I estimated substitution model, conducted with the software RAxML.

Click here for file

**Taxon names of the phylogenetic tree of cyanobacteria**. Species names used in the phylogenetic analysis conducted with RAxML software. Taxon names are ordered by sub-groups as in Figure

Click here for file

Phylogenetic analyses to identify an outgroup

To test different outgroups, phylogenetic trees were reconstructed using all sampled non-cyanobacterial species (Table ^{3}) searches with four chains, three heated and a cold one, were run. The analyses started with a random tree and was run for 5,000,000 generations. Trees and parameters were sampled every 100th generation. The trees were checked to show a standard deviation of split frequencies below 0.05. The first 3,000,000 generations were excluded as the burn-in.

Additionally phylogenetic analyses were conducted with Bayesian inference, using each of the 22 eubacterial species separately with the sampled cyanobacterial subset (58 taxa). Alignments were built using Clustal-X software with default settings ^{3}) searches were run for 10,000,000 generations using M

Phylogenetic analyses of a cyanobacterial subset

Sequence alignments of the 16S rRNA gene sequences from the cyanobacterial subset and

Phylogenetic reconstruction was carried out using Bayesian analysis and maximum likelihood. Maximum likelihood analysis was performed using GARLI 0.96

Bayesian analysis was conducted running two (^{3}) searches, each with four chains, one cold and three heated. Starting with a random tree, analyses were run for 16,616,000 generations each, with trees being sampled every 100th generation. The trees were checked for convergence of parameters (standard deviation of split frequencies below 0.01, effective sample sizes above 200, potential scale reduction factor equal to 1.0) using Tracer v1.4.1

Ancestral character state reconstruction

Character state reconstructions were performed using maximum parsimony (MP; Additional File ^{3 }run were randomly chosen from the post burn-in Bayesian sample and combined. Discrete characters were coded into multicellular or unicellular states. The results over 10,000 Bayesian trees were summarized and displayed on the consensus tree of the Bayesian analysis. For maximum likelihood estimates, both the "Markov k-state 1 parameter model" (MK1 model) and "Asymmetrical Markov k-state 2 parameter model" (AsymmMK model) were applied. Rate of change is the only parameter in the MK1 model. The AsymmMK model exhibits two parameters, describing the forward and backward transitions between states. Phylogenetic conservativeness of multicellularity was tested by comparing the observed distribution of parsimony steps across 10,000 randomly chosen trees from the Bayesian analysis against the distribution from 1,000 trees modified from the Bayesian consensus by randomly shuffling the terminal taxa, while keeping the relative proportion of states unaltered. The root was assumed to be at equilibrium. Transition rates for the MK1 and AsymmMK model were estimated by the program. Rates for the latter models presented in Table

The character states of nodes 3, 4 and 5 of the Bayesian consensus tree were additionally estimated using a reversible jump MCMC search as implemented in BayesTraits

Authors' contributions

BES and HCB conceived the study; BES gathered data and conducted analyses; BES, HCB, AA designed research and wrote the paper. All authors read and approved the final manuscript.

Acknowledgements

We would like to thank Elena Conti, Brian R. Moore and Jurriaan M.de Vos for helpful comments on an earlier version of our manuscript. Furthermore, we would like to thank Marco Bernasconi whose comments on the final version were of great help, and Jurriaan M. de Vos for help with the software BayesTraits.