Centre for Macroevolution and Macroecology, Division of Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, A.C.T. 0200, Australia

Abstract

Background

Recent research has indicated a positive association between rates of molecular evolution and diversification in a number of taxa. However debate continues concerning the universality and cause of this relationship. Here, we present the first systematic investigation of this relationship within the mammals. We use phylogenetically independent sister-pair comparisons to test for a relationship between substitution rates and clade size at a number of taxonomic levels. Total, non-synonymous and synonymous substitution rates were estimated from mitochondrial and nuclear DNA sequences.

Results

We found no evidence for an association between clade size and substitution rates in mammals, for either the nuclear or the mitochondrial sequences. We found significant associations between body size and substitution rates, as previously reported.

Conclusions

Our results present a contrast to previous research, which has reported significant positive associations between substitution rates and diversification for birds, angiosperms and reptiles. There are three possible reasons for the differences between the observed results in mammals versus other clades. First, there may be no link between substitution rates and diversification in mammals. Second, this link may exist, but may be much weaker in mammals than in other clades. Third, the link between substitution rates and diversification may exist in mammals, but may be confounded by other variables.

Background

Diversification is the net outcome of speciation and extinction. Clade size, the current species richness of a lineage, is a measure of net diversification because it is the result of the addition of species through speciation and the removal by extinction. A number of recent studies have shown positive relationships between rates of molecular evolution and net diversification. A positive relationship between substitution rates and species richness has been reported in angiosperms

There are a number of possible causes of a relationship between rates of molecular evolution and net diversification. It has been suggested that elevated substitution rates in diverging populations are the result of changes to the selective and demographic landscape that accompany speciation

However, the majority of studies that report a link between net diversification and substitution rates focus on genes that are not obviously associated with traits under strong positive selection during speciation events. Rather, they tend to be based on "house-keeping" genes, such as metabolic genes (e.g.

It has been suggested that the process of speciation may cause increases in genome-wide substitution rates _{e}

It is also possible the link between net diversification and rates of molecular evolution could be caused by differences in mutation rates between lineages. For instance, higher mutation rates, and subsequently elevated substitution rates, may lead to a more rapid acquisition of hybrid incompatibilities in diverging populations

Finally, there may be no direct causal link between rates of molecular evolution and net diversification. Instead, the association between may be caused indirectly by co-variation between molecular evolutionary rates, diversification and other traits and processes. Shorter generation time, higher fecundity and shorter life-spans have all been linked to substitution rates in mammals

Methodological artifacts could also cause an association between rates of molecular evolution and diversification. For example, it has been suggested that the node density effect, where molecular branch-lengths which pass through more nodes tend to be longer, could be responsible for the association between rates of molecular evolution and diversification in some studies

Mammals provide an ideal opportunity to investigate the generality and potential direction of causality of the relationship between net diversification and rates of molecular evolution. A considerable amount of research has been conducted investigating the relationship between substitution rate variation and life history in mammals

In this study, we use phylogenetically independent comparisons of sister clades to test for an association between substitution rate and clade size in mammals. Using protein-coding genes from both nuclear and mitochondrial genomes, we test for a relationship between clade size and total substitution rates (

These measures provide a way in which to examine the different processes that may cause rates of molecular evolution to co-vary with clade size. Synonymous mutations do not change the encoded amino acid sequences, and while not necessarily neutral _{e}
_{e }
_{e}
_{e }

If positive selection or reductions in _{e }
_{e }

Methods

Sister-Pairs

We used phylogenetically independent

We used published phylogenies to select our phylogenetically independent sister-pairs and their nearest available out-groups. We excluded any potential sister-pairs for which a reciprocally monophyletic relationship between the two clades was not well supported in the literature. References in support of each sister-pair in our analyses are included in Additional File

**Nuclear and Mitochondrial Data**. Excel spreadsheet containing substitution rate estimates, estimates of body size differences between sister-pairs, estimates of species number (clade size), Accession Numbers and references.

Click here for file

Mitochondrial Sister-Pairs and Sequence Data

For our mitochondrial analyses we investigated the relationship between clade size and substitution rates using 28 sister-pairs of clades, corresponding approximately to family level contrasts. Our mitochondrial dataset also provided the additional opportunity to perform analyses on deeper (n = 9 pairs) and shallower (n = 27) sister-pairs of clades, to test whether the relationship between clade size and substitution rate differed with the taxonomic level of the clades

For mitochondrial analyses, we used all protein coding genes from the heavy strand of whole mitochondrial genomes available from GenBank (

To avoid the node density effect in maximum likelihood substitution rate estimates

Sequence selection methods

**Sequence selection methods**. A sister-pair comprising more speciose (green) and less speciose (red) clades. Coloured taxa indicate those for which sequence data is available. Using our methods, Taxon D is selected for analysis, because its root-to-tip branch is separated from the basal node by 6 nodes, compared to 3 for Taxon L. By contrast Taxon A is selected for analysis using our methods because its root-to-tip branch is separated from the basal node by two internal nodes, compared to one for Taxon C. In both cases, a large component of the sequences are shared by other members of the respective clades over the whole molecular branch length, relative to the sister clade.

Where more than one mitochondrial genome sequence was available on GenBank for a given clade, we selected the sequences based on the number of internal nodes in the published molecular phylogenies used to select the sister clades. In the more speciose clade, we chose the sequence with the greatest number of internal nodes. In the less speciose clade, we selected the sequence with the fewest number of internal nodes (shown in Figure

Nuclear Sister-Pairs and Sequence Data

For our nuclear data, we investigated the relationship between substitution rate and clade size using 31 sister-pairs of clades, corresponding to approximately family-level contrasts. We also tested for relationships between clade size and substitution rate within specific groups of mammals, as it has been shown that patterns of substitution rate variation and patterns of diversification can differ between these groups

For our nuclear analyses, we used nuclear genes obtained from GenBank. There was a substantial trade-off between taxonomic and genetic coverage for nuclear gene sequences. In order to optimise both of these (and thus optimise power in subsequent regression analyses), different sets of nuclear genes were chosen for different groups. Our whole mammalian analysis (n = 31) included

As with our mitochondrial analysis, to reduce the impact of the node density effect in maximum likelihood substitution rate estimates we used a single representative nuclear gene sequence for each clade. We used the same selection criteria for selecting our sequences where more than one sequence was available for a gene within a given clade. In some instances, we were unable to obtain all nuclear gene sequences from a single species to represent a given clade. In these instances, we constructed chimeric sequences, where gene sequences were sourced from different species within a single clade. In doing so, we selected species that were as closely related as possible.

Substitution Rate Estimates

We used

**Phylogenies**. PDF document containing phylogenies used for all analyses described in the main text.

Click here for file

Clade Size

We used extant clade size as a measure of net diversification for our analyses. Previous research investigating these relationships have used varied metrics to represent diversification, including extant clade size

Body size

Substitution rates in mammals are known to be influenced by a number of life history variables, including generation time

We calculated body mass contrasts for each sister pair used in our analyses. We obtained body mass values for most species in each clade from the panTHERIA database

**Body Mass Data**. PDF document containing body mass data and references additional to those sourced from the panTHERIA life history database

Click here for file

We used the maximum likelihood estimator (MLE) of Welch and Waxman

Statistical Tests

Testing for Substitution Rate Variation

We tested whether our alignments contained significant variation in substitution rates between terminal lineages. We compared the likelihoods of two models: an equal-rate model, where terminal branches within a pair are constrained to have equal substitution rates, but substitution rates are allowed to vary between pairs; and a free-rate model, where a separate substitution rate is estimated for each terminal branch. We calculated the likelihood of each of these models using the phylogenies shown in Additional File

**Rate Variation Test outputs**. PDF document containing outputs of tests of rate variation in all datasets used, comparing a free-rate versus fixed rate models across trees.

Click here for file

Linear regressions

We tested for associations between differences in clade size, body size and substitution rates, using linear regressions forced through the origin _{
A
})-ln(V_{
B
}), where ln(V_{
i
}) represents the log-transformed variable for Clade

More distantly diverged sister-pairs are associated with more evolutionary change, and thus tend to generate contrasts of larger magnitude; this can lead to unequal variance between data points _{A }+ _{B})^{0.5}. We used the diagnostic methods recommended by Garland

To verify that our results were not dependent on the transformations or standardisations used, all statistics were also performed on non-transformed and non-standardised data, and the results did not differ. All statistics and diagnostic tests were performed in R

Correction for multiple tests

Our analysis resulted in a number of tests of three hypotheses:

In combining our tests of hypotheses of clade size against measures of rates of molecular evolution,

**Weighted Z Test calculations**. Excel spreadsheet containing values and calculations for Weighted Z test of multiple comparisons.

Click here for file

Results

Evidence of Substitution Rate Variation

A free-rate model, where a separate substitution rate was estimated for each branch, had significantly better fit to the data for 4 of our 6 alignments, over an equal-rate mode where terminal branches within a pair had equal substitution rates. Free-rate models for

Mitochondrial Data

There were no significant associations between

Mitochondrial Family (Approximately) Level Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

-1.1185

0.1279

27

0.066

ln(Clade Size)

ln(

-0.6133

0.0128

27

0.560

ln(Clade Size)

ln(^{#}

-0.0865

0.0867

16

0.236

ln(Clade Size)

ln(^{#}

-0.0077

0.2127

16

0.054

ln(Clade Size)

ln(Body Size)

0.1398

0.0130

27

0.545

ln(

ln(Body Size)

-0.0073

0.0003

27

0.921

ln(

ln(Body Size)

0.0017

0.0001

27

0.968

ln(^{#}

ln(Body Size)

-0.2046

0.0024

16

0.846

ln(^{#}

ln(Body Size)

-0.2721

0.1371

16

0.130

Mitochondrial Deep Level Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

-0.6282

0.0199

8

0.698

ln(Clade Size)

ln(

2.8130

0.1062

8

0.358

ln(Clade Size)

ln(Body Size)

-0.4004

0.0666

8

0.472

ln(

ln(Body Size)

-0.0117

0.0011

8

0.927

ln(

ln(Body Size)

-0.0339

0.0351

8

0.602

Mitochondrial Shallow Level Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

0.2991

0.0051

25

0.722

ln(Clade Size)

ln(

-0.5455

0.0069

25

0.683

ln(Clade Size)

ln(^{#}

-1.0500

0.1087

23

0.107

ln(Clade Size)

ln(^{#}

-0.0035

0.0010

23

0.88

ln(Clade Size)

ln(Body Size)

0.1566

0.0278

24

0.416

ln(

ln(Body Size)

0.0006

6 × 10^{-6}

24

0.990

ln(

ln(Body Size)

-0.0444

0.0960

24

0.123

ln(^{#}

ln(Body Size)

0.0280

0.0074

23

0.683

ln(^{#}

ln(Body Size)

-1.1831

0.0159

23

0.553

**Tables 1, 2 and 3 - Regressions between rates, clade size and body size for mitochondrial sequence data**

Traits are measured as differences in values between sister-pairs of mammalian clades. Co-efficient: estimated co-efficient of the predictor variable; R^{2 }= co-efficient of determination; d.f: degrees of freedom in model. Synonymous substitution rates and dN/dS ratios (^{#}; all other rates were estimated in

In case synonymous substitution rates were overestimated by the particular model in

Therefore, as a

Welch

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

-0.2485

0.0064

42

0.605

ln(Clade Size)

ln(

-1.4968

0.1031

26

0.096

ln(Clade Size)

ln(

0.4371

0.0179

27

0.423

ln(Clade Size)

ln(Body Size)

0.0783

0.0066

42

0.600

ln(

ln(Body Size)

0.0545

0.0306

42

0.256

ln(

ln(Body Size)

-0.1263

0.1728

25

**0.031 ***

ln(

ln(Body Size)

0.0586

0.0338

36

0.269

**Table 4 - Regressions between rates, clade size and body size for mitochondrial sequence data of Welch et al**.

Traits are measured as differences in values between sister-pairs of mammalian clades. Co-efficient: estimated co-efficient of the predictor variable; R^{2 }= co-efficient of determination; d.f: degrees of freedom in model; P value: significance of value of model; Significance: * = P < 0.05, ** = P < 0.005.

We did not detect a significant relationship between body size and our estimates of

There were no significant associations between body size and clade size in any of our mitochondrial datasets.

Nuclear Data

We did not find any association between clade size and any of the measures of substitution rate (^{2 }= 0.1857, P = 0.0453: Table

Mammalia Nuclear Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

-0.5432

0.0252

25

0.421

ln(Clade Size)

ln(

0.0987

0.0034

26

0.765

ln(Clade Size)

ln(

-1.2561

0.1022

26

0.097

ln(Clade Size)

ln(

-0.4149

0.0094

23

0.645

ln(Clade Size)

ln(Body Size)

0.0414

0.0022

31

0.793

ln(

ln(Body Size)

-0.1062

0.1569

25

**0.041 ***

ln(

ln(Body Size)

-0.1292

0.2794

25

**0.004 ****

ln(

ln(Body Size)

0.0287

0.0154

26

0.529

ln(

ln(Body Size)

-0.1237

0.3384

22

**0.002 ****

Eutheria Nuclear Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

0.8154

0.05681

18

0.312

ln(Clade Size)

ln(

1.2337

0.1111

18

0.151

ln(Clade Size)

ln(

-1.6541

0.1499

15

0.125

ln(Clade Size)

ln(

1.8792

0.1839

20

**0.0453 ***

ln(Clade Size)

ln(Body Size)

-0.1453

0.0308

22

0.4123

ln(

ln(Body Size)

-0.2412

0.3446

18

**0.0065 ***

ln(

ln(Body Size)

-0.1759

0.2149

18

**0.0395 ***

ln(

ln(Body Size)

0.1272

0.1433

15

0.134

ln(

ln(Body Size)

-0.0925

0.2397

20

**0.0208 ***

Metatheria Nuclear Contrasts

**Response Variable**

**Predictor Variable**

**Coefficient**

**R ^{2}**

**d.f**.

**P value**

ln(Clade Size)

ln(

2.2613

0.1736

6

0.304

ln(Clade Size)

ln(

-1.6842

0.2381

6

0.221

ln(Clade Size)

ln(

1.5536

0.3391

6

0.132

ln(Clade Size)

ln(

-2.6421

0.1202

6

0.401

ln(Clade Size)

ln(Body Size)

0.7831

0.3792

6

0.14

ln(

ln(Body Size)

0.1231

0.2758

6

0.181

ln(

ln(Body Size)

-0.0644

0.0306

6

0.679

ln(

ln(Body Size)

0.1876

0.1549

6

0.335

ln(

ln(Body Size)

-0.0094

0.0031

6

0.895

**Tables 5, 6 and 7 - Regressions between rates, clade size and body size for nuclear sequence data.**

Traits are measured as differences in values between sister-pairs of mammalian clades. Co-efficient: estimated co-efficient of the predictor variable; R^{2 }= co-efficient of determination; d.f: degrees of freedom in model; P value: significance of value of model; Significance: * = P < 0.05, ** = P < 0.005.

Body size was significantly negatively associated with

The MLE method of body mass contrast estimation assumes homogeneity of variance in body size between both clades in a sister pair. We found that this assumption was not valid for a minority of contrasts (Additional File

Correction for multiple tests

Weighted Z tests indicate there is no association between clade size and ^{-5}, ^{-4}; Table

Z Test Results on Multiple P Values

**Response**

**Predictor**

**n**

**Weighted Z**

**P value**

Clade Size

6

-0.1190

0.4526

Clade Size

7

0.9506

0.8291

Clade Size

6

1.4619

0.9281

Clade Size

6

0.8970

0.8151

Clade Size

Body Size

7

0.7700

0.7794

Body Size

7

-0.7400

0.229

3

-2.9344

**0.0017 ****

4

0.7970

0.7873

Body Size

6

-3.2518

**0.000573 ****

Body Size

6

-3.2421

**0.000593 ****

Body Size

6

-1.2667

0.1026

**Table 8 - Results of weighted Z tests for multiple comparisons.**

Weighted Z: the combined weighted value for multiple Z scores for each individual test; n: number of tests across which Weighted Z score was calculated; P value: significance of Weighted Z score. Significance: * = P < 0.05, ** = P < 0.005.

Discussion

We have found no evidence for a link between net diversification and substitution rate in mammals. We did not find a significant relationship between clade size and total substitution rate (

There are a number of explanations for our failure to detect a relationship between substitution rates and clade size in mammals: (1) the relationship exists but our analyses do not have the power to detect it; (2) the relationship exists, but is confounded by other processes in mammals; and (3) the relationship between clade size and substitution rates is not universal and does not exist in mammals.

We cannot rule out a lack of power producing the results we report here, but we do not consider this the most likely explanation for our results. We were able to detect a significant relationship between body size and substitution rates in both our nuclear data and the mitochondrial data from Welch

It is possible that there is an association between substitution rates and clade size in mammals, but that this relationship is masked by interactions with other variables. For instance, it has been suggested that abundance (measured as group size or population density) is positively linked to diversification rate in mammals

Perhaps a more likely explanation for the lack of an association between substitution rates and clade size in mammals is that the relationship does not exist for this group. Previous explanations of the association between rates of molecular evolution and clade size have focused on three possible causes: (i) speciation causes increases in substitution rates; (ii) mutation rates drive diversification; and (iii) both diversification and substitution rate are linked to another factor.

Some previous studies have explained a positive association between net diversification and substitution rate as the result of the demographic and selective processes characterising speciation _{e }
_{e }
_{e}

A recent study indicated that the correlation between substitution rate and clade size in birds might be driven by the effect of mutation rates on the process of diversification

It is also possible that the positive association between rates of molecular evolution and clade size observed in some taxa is not due to a direct effect of speciation on molecular evolution, or vice versa, but the result of another variable driving both processes independently of each other, leading to an indirect correlation between the two.

Many life-history correlates of substitution rate in mammals have been identified

Conclusions

Contrary to patterns observed in other taxa, we have not detected a relationship between clade size in mammals and substitution rate, measured from total, synonymous and non-synonymous substitution rates in both nuclear or mitochondrial genes. Given that our study is likely to have comparable power to other similar studies, these results suggest that any association between net diversification and substitution rate is either absent or very weak in mammals.

Authors' contributions

XG, RL and LB designed the analyses; XG performed the analyses; XG, RL and LB wrote the manuscript. All authors have read and approved the final manuscript.

Acknowledgements

Thanks to Marcel Cardillo for providing the mammalian super-tree and assistance with running the MLE analysis. Thank you to Simon Ho, Dorothee-Marie Huchon-Pupko and Geeta Eick for providing additional phylogenetic trees. Thank you to Matt Phillips for assistance with statistical analyses. Thanks to John Welch for providing assistance with running the MLE analysis. We appreciate the thorough work of two anonymous reviewers for their assistance in greatly improving this article.