School of Biological Sciences, University of Queensland, Brisbane, 4072 Queensland, Australia

Department of Biology, École Normale Supérieure, , 45 rue d’Ulm, 75005 Paris, France

Abstract

Background

Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable.

Methods

We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses.

Results

We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS.

Conclusions

Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.

Background

Comparative analysis is a central tool in evolutionary biology and ecology: if we wish to understand the co-evolution of traits and their relationships with their environment, comparisons among species can identify relationships among traits and environmental variables that signify underlying evolutionary or ecological processes. The use of comparative studies allows biologists to address important concepts like adaptation

Often, comparative studies summarise relationships using correlation or regression coefficients. Such analyses require special tools to take into account the phylogeny of species, as their shared evolutionary histories lead to phylogenetic structure in the data (a specific form of non-independence of data)

Transformation from a phylogenetic tree to a variance-covariance matrix under the Brownian Motion (BM) model: the variance is set to be the branch length from the root to the tip

**Transformation from a phylogenetic tree to a variance-covariance matrix under the Brownian Motion (BM) model: the variance is set to be the branch length from the root to the tip.** The covariance is the branch length from the root to the most recent common ancestor.

We can fit a linear regression between data vector

where

This "phylogenetic" regression can easily be computed using Generalised Least Squares, GLS

Ideally, we should directly incorporate phylogenetic uncertainty into our models, because this will give us a more "honest" analysis, with correct p-values and estimated parameter distributions that more fully represent the current state of our knowledge. To assume no phylogenetic nor measurement uncertainty may lead to bias and may severely overestimate out confidence in the conclusions. Since it can be difficult to derive an accurate tree, comparative studies should allow for uncertainty in the phylogeny

Bayesian statistics is based on Bayes’ Theorem, which can be expressed by the equation:

where

where

Here, we show how phylogenetic uncertainty can be incorporated in many of the models that biologists commonly employ. BUGS code for each model is provided in an Additional file

**Appendix.** BUGS code for the models.

Click here for file

Results

Data analysis & results

Linear regression model with simulated data

Using this simple model, we will illustrate the value of using empirical tree distributions for comparative analysis. As an example of frequentist analysis tools, we will use the R function

We used a set of 100 trees from the posterior distribution of a phylogeny of rainforest plant species generated using BEAST in an analysis of trnL-F chloroplast sequences (J. Wells, unpubl.). We chose 100 trees as being a reasonable compromise between the sampling error of the trees and computational convenience (see the technical discussion for details of memory usage and computation times for up to 10,000 trees). For simulations, we selected one tree to be the "correct" tree (Figure
_{0}=5, _{1}=2 and residual variance _{0}, _{1} and _{0} and _{1}: estimates based on true GLS and Bayesian AT are more precise than estimates based on methods using the consensus tree. When comparing the average widths of 95% confidence or credible intervals, as a measure of the uncertainty, we see that there is higher uncertainty when using methods based on the single consensus tree. Even though these consensus-tree based estimates are more uncertain, we also see that they yield higher Type 1 error rates associated to confidence or credible intervals (i.e. proportion of times that the estimated interval does not contain the true parameter value). These error rates are approximately twice as high as expected for the slope _{1} (i.e. roughly 10% error, rather than the nominal 5%). This situation of anti-conservative coverage despite higher uncertainty, is likely to originate from the lower precision of estimates based on single consensus trees. Note that credible intervals based on the Bayesian AT method can yield anti-conservative coverages as well (Table

Distribution of the individual measurements sppp-values for the Measurement Error (ME) model

**Distribution of the individual measurements sppp-values for the Measurement Error (ME) model.** Solid lines are the 2.5% and 97.5% limits, dashed line is for 50%.

**True**** σ **

**Estimation method**

** β **

** β **

** σ **

**CI size**

**Error rate**

**CI size**

**Error rate**

**CI size**

**Error rate**

Simulated date were _{0}=5, _{1}=2 and several levels for ^{∗}) are significantly different from expected 5% error rate (Binomial test,

2

Real GLS

6.12

0.002^{∗}

0.23

0.121^{∗}

1.74

1.000^{∗}

True GLS

3.39

0.044

0.20

0.058

0.83

0.041

Bayes AT

3.26

0.057

0.20

0.063^{∗}

0.86

0.037^{∗}

Bayes OT

5.99

0.002^{∗}

0.22

0.131^{∗}

1.67

1.000^{∗}

5

Real GLS

15.34

0.003^{∗}

0.57

0.125^{∗}

4.37

1.000^{∗}

True GLS

8.50

0.057

0.49

0.059

2.07

0.050

Bayes AT

8.11

0.070^{∗}

0.49

0.066^{∗}

2.14

0.045

Bayes OT

15.02

0.003^{∗}

0.56

0.133^{∗}

4.18

1.000^{∗}

10

Real GLS

30.43

0.003^{∗}

1.14

0.115^{∗}

8.66

1.000^{∗}

True GLS

16.93

0.053

0.99

0.044

4.13

0.056

Bayes AT

16.16

0.069^{∗}

0.99

0.056

4.27

0.051

Bayes OT

29.78

0.004^{∗}

1.11

0.123^{∗}

8.30

1.000^{∗}

15

Real GLS

45.72

0.002^{∗}

1.70

0.119^{∗}

13.01

1.000^{∗}

True GLS

25.36

0.051

1.46

0.054

6.19

0.051

Bayes AT

24.18

0.070^{∗}

1.46

0.062^{∗}

6.38

0.048

Bayes OT

44.72

0.003^{∗}

1.66

0.128^{∗}

12.47

1.000^{∗}

Linear regression model with real data

To check the behaviour of our models with real data, we used real trait measurements (stem-tissue density and leaf-tissue density) for seedlings of the species in the rainforest phylogeny mentioned above (J. Wells, unpubl.). We modelled this data set using the simple Linear Regression model in its frequentist (GLS) and Bayesian form, and a regression model incorporating Pagel’s _{1}. This probably resulted from the consensus tree being a poor summary of the true tree, which is supported by the low estimation of

**LR model**

**PL model**

**Parameter**

**GLS**

**Bayesian (AT)**

**GLS**

**Bayesian (AT)**

Values are:

_{0}

-1.07 (0.47)

-0.70 (0.31)

-0.75 (0.14)

-0.50 (0.11)

_{1}

0.31 (0.13)

0.62 (0.10)

0.55 (0.11)

0.58 (0.11)

—

—

0.24 (-0.12, 0.63)

0.82 (0.13)

1.18

0.70 (0.074)

0.33

0.77 (0.095)

—

0.545

—

0.9974

The phylogenetic signal strength is estimated in the response variable _{0}and _{1} compared to the Linear Regression model discussed above. However, although the ppp-value for the Linear Regression model is good (0.545; see Figure
_{1} is [0.42;0.82]: the slope is clearly positive. We conclude that the density of seedling leaf tissue scales positively with the density of the stem, as predicted if species with higher density (and thus better protected) seedling leaves, also invest in more robust stems.

Phylogeny of the "correct" tree used for simulated data

**Phylogeny of the "correct" tree used for simulated data.** Number of species: 50.

Measurement error model

The previous example analysed data for which there was only one datum per species. The data set was actually a subset of a larger data set with replicated measurements for each species, in order to model variation among individuals and/or variation due to technical measurement error (conceived broadly as "Measurement Error"). Instead of a vector of species measurements as for the Linear Regression model, we constructed matrices of individual measurements (columns), for each species (rows). Note that, in the Measurement Error model, matrices of measurements _{1} of [0.46, 0.99]. In posterior checks, the ppp-value for estimates of the species-level values was acceptable. However, the distribution of sppp-values based on the individual measurements showed a slight but consistent overdispersion of the replicates compared to the real data distribution (see Figure
_{
V
} and _{
W
} were not constant across species, for example if some species contain a wider range of genetic variants or show higher phenotypic plasticity in the expression of a trait.

**Parameters**

**ME model**

**LR model**

Values are:

_{0}

-0.59 (0.37)

-0.70 (0.31)

_{1}

0.72 (0.13)

0.62 (0.10)

_{
R
}

0.59 (0.078)

0.70 (0.074)

_{
V
}

0.15 (0.0082)

—

_{
W
}

0.14 (0.00074)

—

0.323

0.545

Distribution of estimates for _{0}, _{1} and

**Distribution of estimates for **_{ 0 }**,**** β **

Computational performance

We performed an analysis of simulation time and memory use for our models. The two main factors that may influence simulation performance are the number of species

Simulation time as a function of the number of species for the Linear Regression (LR) model (with 100 trees)

**Simulation time as a function of the number of species for the Linear Regression (LR) model (with 100 trees). **

Simulation time as a function of the number of species for all models in OpenBUGS with 100 trees

**Simulation time as a function of the number of species for all models in OpenBUGS with 100 trees. **

Memory usage in OpenBUGS for the linear regression model

**Memory usage in OpenBUGS for the linear regression model. **

Discussion

We have shown that Bayesian methods for phylogenetic comparative analysis are easy to implement in the BUGS language, often only requiring several lines of code. This puts Bayesian methods within the reach of all researchers who wish to adopt the Bayesian mode of inference for phylogenetic comparative analyses. Since Bayesian methods provide a natural way of incorporating identifiable sources of error into an analysis, we believe Bayesian methods should become more common in comparative studies. We emphasise that failing to account for obvious sources of uncertainty in a statistical analysis is very likely to lead to more imprecise estimates (Figure

Distribution of the ^{2} -like discrepancy difference for linear regression model with empirical prior and real data. The ppp-value is the proportion of values above zero

Distribution of the ^{2}-like discrepancy difference

Bayesian methods allow the modelling of multiple sources of uncertainty through the explicit use of prior distributions on model parameters. Because they require the quantitative representation of parameter uncertainty, Bayesian methods offer an excellent framework for the integrated analysis of comparative data that contain several sources of uncertainty, such as phylogenetic error and measurement error. Allowing for several sources of error in a frequentist analysis is more difficult, although

Although we have only explored three possible phylogenetic comparative models here, it is clear that the BUGS formalism is likely to be able to represent almost any reasonable Bayesian comparative model. Further, researchers can use our programs as building blocks to modify and combine analyses. For example, it is easy to combine the Measurement Error and Phylogenetic Signal models to form a measurement error model which simultaneously estimates phylogenetic signal.

We have demonstrated how to model phylogenetic uncertainty using an empirical prior set of trees derived from the output of Bayesian phylogenetic tree estimation programs. Use of this empirical prior is most attractive, because our simulations show that the estimates of regression coefficients are more precise and unbiased for residual variance (Figure

Technical choices

Both OpenBUGS and JAGS use Markov Chain Monte Carlo (MCMC) algorithms and are based on the BUGS syntax. JAGS has a more flexible interpretation of the BUGS syntax than OpenBUGS, allowing the simplification of some parts of the computation (see Additional file

In this study, we used a relatively small number of sampled trees (usually 100) for computational convenience. However, for a real study, using a large number of trees is expected to better represent their true probability distribution, and hence decreases the Monte Carlo error and the impact of any very unlikely tree. We have seen that the number of trees

Issues & perspectives

For some data sets with a large number of species and a small number of trees (for example

The results presented here all use the simple Brownian Motion (BM) model of character evolution, but one can use any other model in the process of computing variance-covariance matrices (e.g. models proposed by

The Measurement Error model enables us to estimate the linear relation between two variables when both are random, and so we aim to estimate their joint variation rather than assigning a direction of prediction from an ’explanatory’ variable to a ’response’. It is also free from the need to assume that the error variances of X and Y are equal, or that the ratio of error variances equals the ratio of variances (as is required in Major Axis methods or Standardised Major Axis methods, see

Conclusions

Why should researchers interested in performing phylogenetic comparative analyses choose to use our Bayesian methods over traditional frequentist methods? As we have demonstrated, Bayesian methods allow a lot of flexibility in the type of models that can be fitted, and Bayesian statistics provides a natural way of incorporating identified sources of uncertainty through the use of prior distributions. A central problem for frequentist phylogenetic comparative models has been that the regression estimators assume that the phylogeny is known without error. Although several authors have proposed methods to deal with phylogenetic uncertainty, few have become accessible to biologists through software applications (but see

In this study, we have concentrated on providing models that can be easily understood in the BUGS model programming language, and implemented using the user-friendly OpenBUGS program. We believe that most biologists who are new to Bayesian modelling will probably use this program (or a similar BUGS system, such as WinBUGS or JAGS). These programs have been designed for extreme flexibility in the types of models that can be fitted. However, this flexibility can be traded off against the speed of computation, compared to software that is more constrained in the types of models that can be fitted. One example of this is Hadfield’s MCMCglmm for R

While this study is based on the ideas of

Finally, we wish to emphasise the importance of model checking. Bayesian methods have been adopted enthusiastically by many researchers, but in promoting Bayesian methods, model checking is often overlooked, e.g.

Methods

Notation

Here, and for the rest of the paper, we use the following notation : ^{2} can be directly interpreted as the residual variance and

where

Linear regression model

In order to illustrate the practical nature of our methods, we first give a simple example. One classic model for comparative analysis is a linear regression across a multispecies data set. To construct it, we used a multivariate normal for the likelihood and conjugate priors. The model can be specified as follows:

The priors on the components of ^{−2 } is weakly informative for small variance

Measurement error model accounting for intraspecific variation

Comparative analyses frequently represent each species by a single value, such as a mean estimated from a small sample of individuals. Often, the intraspecific variance in trait values is not considered. Such variance can arise from sources including meaningful biological variation among individuals, inaccuracies of measurement, or poor sampling. Analyses that do not consider such "measurement error" may lead to biases or inaccuracies in evolutionary inferences

Here we develop a model for the relationship between two traits across species, and incorporate variation across individuals within species, by using measurements from multiple individuals per species. The forms of intraspecific variation that this model can incorporate are:

Here we focus on the situation where one measurement was taken per individual, and hence we treat natural variation and measurement error

We take several _{
R
}. Note that both

After initial experimentation, we found that a weakly informative prior on the species level _{0}=0.5 and _{0}=0.5 for a trait known to be between zero and one) enabled the model to return more stable estimates for

Phylogenetic signal model

It is often of interest to quantify the strength of phylogenetic signal

where **I** is the identity matrix. The model can then be written as:

This model estimates the regression coefficients

Model checking

A fundamental part of statistical modelling is checking the goodness-of-fit of the model to the data. That is, does the model adequately capture the properties of the data? This procedure is called "posterior checking" in the Bayesian framework
^{2}function suggested by
^{2}, we will use the

where

The essence of the posterior predictive check is to compute this distribution for hypothetical replicates of the data ^{
rep
} and see if the value for the data ^{
rep
}, it is necessary to integrate over all the possible parameter values. One solution is to draw

and compute the ^{∗}(

Discrepancy values are used to compare the dispersion of the replicates to the dispersion of the data and detect potential outliers or consistent over- and underdispersion (see examples in Figures

For the Measurement Error model, we split the posterior checking into two parts. We assessed the estimates for the parameters of the linear relation, and for the species-level values

The ppp-value is not the probability of the model being true. Rather it is the probability of observing more extreme data than the current data set, given the model assumptions, the posterior distribution of parameters and the discrepancy statistic. Therefore, our use of ppp-values is solely to assess how "surprising" the data appear to be under the model assumptions and the parameters estimates. If the ppp-value is very extreme (close to zero or one), this alerts us to possible structural problems in the model, since it means that the distribution of data simulated from the model differs from the data we actually observed for a particular aspect of the model (distribution of residuals, mean, etc.). This can help to identify aspects of the model that are failing to represent the data adequately and should be altered (see an example in our Real Data analysis with Phylogenetic Signal model). Unlike classical p-values, the Bayesian ppp-values are not necessarily uniformly distributed under the null hypothesis and should not be compared across models or be used to set a permissible type I error rate (false rejection of the model,

If the interest is in comparing _{
i
} to a particular value, you can simply give the posterior probability that _{
i
} falls in any particular range of values. For example, you might want to know the probability of values less than zero, or greater than zero, or within a certain distance of zero. Bayesian inference enables us to make a direct statement about this probability, rather than accepting or rejecting a point hypothesis with an assumed significance level. The probability is equal to the proportion of the area under the probability density function that falls in a particular range. For example, if we were interested in whether _{
i
} was greater than or less than zero, and the posterior distribution had only 1% of its area in a lower tail extending into negative numbers, then we would conclude that the probability that _{
i
} is less than zero, given the data, is 0.01. By the same finding, the probability that _{
i
} is positive, is 99%.

Implementation of models and data analysis

The general aim for our models is to estimate the posterior distribution of parameters of a model where the data are correlated through a phylogenetic relationship for which we have a prior distribution. The two main assumptions of our models are

We used the statistics software R

MCMC algorithms sometimes exhibit excessive autocorrelation among successive values in the chain, leading to inefficient sampling of the full parameter space if the dependence among samples extends for more than a few iterations. If the autocorrelation is high for a parameter, it may be necessary to let the simulation run longer and take a subsample of the MCMC output. We discuss autocorrelation issues for each model, below, along with other features of their application.

We report ’ppp-values’ (posterior predictive p-values) as an indicator of probabilities of the observed data, under the best-fit model; and ’sppp-values’ (species ppp-values, for each species) as an indicator of the under or overdispersion in replicate datasets generated under the best-fit model (see "Model checking" section).

Authors’ contributions

SPB Conceived the study, wrote some of the BUGS code, and contributed to writing the manuscript. PdV wrote most of the BUGS code, conducted all analyses, and contributed to writing the manuscript. JAW conceived the Measurement Error model and assisted in analyses, and JAW and RDE both contributed data and helped write the manuscript. All authors have read and approved the final manuscript.

Acknowledgements

This research was funded by Discovery Project grant DP0878542 to SPB from the Australian Research Council. K. Cheney, D. O. Fisher, F. Frentiu and M. Woolfit provided helpful comments on previous drafts of this paper. L. Cook provided timely botanical advice. J. Felsenstein and other anonymous reviewers provided helpful comments and discussions.