Abstract
Background
A first step in building a mathematical model of a biological system is often the analysis of the temporal behaviour of key quantities. Mathematical relationships between the time and frequency domain, such as Fourier Transforms and wavelets, are commonly used to extract information about the underlying signal from a given time series. This onetoone mapping from time points to frequencies inherently assumes that both domains contain the complete knowledge of the system. However, for truncated, noisy time series with background trends this unique mapping breaks down and the question reduces to an inference problem of identifying the most probable frequencies.
Results
In this paper we build on the method of Bayesian Spectrum Analysis and demonstrate its advantages over conventional methods by applying it to a number of test cases, including two types of biological time series. Firstly, oscillations of calcium in plant root cells in response to microbial symbionts are nonstationary and noisy, posing challenges to data analysis. Secondly, circadian rhythms in gene expression measured over only two cycles highlights the problem of time series with limited length. The results show that the Bayesian frequency detection approach can provide useful results in specific areas where Fourier analysis can be uninformative or misleading. We demonstrate further benefits of the Bayesian approach for time series analysis, such as direct comparison of different hypotheses, inherent estimation of noise levels and parameter precision, and a flexible framework for modelling the data without preprocessing.
Conclusions
Modelling in systems biology often builds on the study of timedependent phenomena. Fourier Transforms are a convenient tool for analysing the frequency domain of time series. However, there are wellknown limitations of this method, such as the introduction of spurious frequencies when handling short and noisy time series, and the requirement for uniformly sampled data. Biological time series often deviate significantly from the requirements of optimality for Fourier transformation. In this paper we present an alternative approach based on Bayesian inference. We show the value of placing spectral analysis in the framework of Bayesian inference and demonstrate how model comparison can automate this procedure.
Background
Pattern recognition is central to many scientific disciplines and is often a first step in building a model that explains the data. In particular, the study of periodic phenomena and frequency detection has received much attention, leading to the wellestablished field of spectral analysis.
Biology is rich with (near) periodic behaviour, with sustained oscillations in the form of limit cycles playing important roles in many diverse phenomena such as glycolytic metabolism, circadian rhythms, mitotic cycles, cardiac rhythms, hormonal cycles, population dynamics, epidemiological cycles, etc. [1]. A conventional method for frequency detection is Fourier analysis. It is based on the fact that it is possible to represent any integrable function as an infinite sum of sines and cosines. The Fourier Transform (FT) uses this property to reveal the underlying components that are present in a signal [2]. Fourier theory has given rise to a wide range of diverse developments and farreaching applications, demonstrating the theory's undisputed importance and impact. For frequency detection, however, it is known that the FT works optimally only for uniformly sampled, long, stationary time series. Furthermore, common procedures of preprocessing the data can cause problems. Time series can contain low frequency background fluctuations or drift that are unrelated to the signal of interest. For the FT, it is then necessary to remove the trends using detrending techniques. As has been shown previously, this detrending leads to convolution of the signal that can both remove evidence for periodicity and add false patterns [3]. Another known problem is aliasing. If a signal containing high frequencies is recorded with a low sampling rate, peaks of high frequencies can fold back into the frequency spectrum, appearing as low frequencies [2]. The Gibbs phenomenon provides another example where spurious peaks appear in a FT. It occurs at points of discontinuity in a periodic function, and results in socalled ringing artefacts around the "true" frequency peak [4]. As for the accuracy of the frequency estimate, no direct information of this is given by the output from a FT, since the sharpness of the peaks depends on a combination of factors such as noise levels and the length of the time series. For further details, see the extensive FT literature (e.g. [2,5]).
Wavelet Transforms [610] offer an attractive alternative to Fourier Transforms. The main difference is that they are localised in both the time and frequency domain. This property makes wavelets better adapted to problems with truncated data. Wavelets have found wideranging applications and have proven to be particularly powerful for image processing and data compression [1113].
Bayesian inference provides another approach for analysing data (for an introduction to Bayesian analysis, see [14]). It addresses additional aspects of the problem, such as the inherent uncertainty of the analysis and the effects of external noise. Using this framework [3], the method of Bayesian Spectrum Analysis (BSA) was developed by Bretthorst [15] and applied to Nuclear Magnetic Resonance (NMR) signals and parameter estimation with great success [16,17].
There are several advantages of the Bayesian approach, including an inherent mechanism for estimating the accuracy of the result and all parameters, as well as the ability to compare different hypotheses directly. Focus is shifted to the question of interest by integrating out other parameters and noise levels. Initial knowledge of the system can be incorporated in the analysis and expressed in the prior probability distributions. There has been a recent flood of Bayesian papers with some convincing applications and promising developments in systems biology (see [1830], and many others). The Bayesian approach to time series analysis has proven its value in fields such as NMR and ion cyclotron resonance analysis (e.g. [31] and [32]).
In this paper, we describe the development, implementation and testing of Bayesian model development coupled with BSA and Nested Sampling, in a biological context. We present a comparison of this approach with the FT, applied to a number of simulated test cases and two types of biological time series that present challenges to accurate frequency detection. We first present some necessary background, upon which we build to develop to our approach.
Bayesian inference
Data is rarely available in sufficient quantity and quality to allow for exact scientific deduction. Instead we are forced to infer models from incomplete knowledge. Bayesian inference is based on Bayes' Rule, which evaluates a hypothesis, H, in light of some data, D, and some prior information, I. It is a method of assigning probabilities based on the current state of knowledge, allowing for subsequent reevaluation as new data becomes available. The goal is to determine P (H D, I), the posterior probability distribution of the hypothesis, given the data and the prior information. With Bayes' rule, the posterior can be expressed in terms of other probabilities as
where P(DH, I) is the probability of observing the data given the hypothesis and the prior, P(HI) is the prior probability of the hypothesis, and P(DI) is the probability of the data given the prior. When the hypothesis is the variable and the data is held constant, P(DH, I) is called the likelihood function, and when the hypothesis is constant it is called the probability of obtaining a specified outcome (data). When evaluating only one hypothesis, P(DI) is a normalising constant, but when investigating more than one hypothesis this term plays a key role and is called the evidence [33].
Bayesian Spectrum Analysis
Our presentation in this section follows closely that of Bretthorst [15]. The aim is to infer the most probable frequency (or frequencies), ω, from the given data. We start by building a model (the hypothesis H) for the observed data, parameterised by angular frequency, ω, and amplitudes, c, and then use Bayes' rule to compute the posterior probability of the parameters, P(ω, cD, H, I). By assigning priors to the model parameters c and integrating over these, we arrive at the posterior probability for the parameter of interest, ω, . This is referred to as the marginal posterior probability of ω. We note that ω is an rtuple, {ω_{1}, ω_{2}, ..., ω_{r}}, with as many elements as there are distinct frequencies in the data.
A general model for observed data sampled at N discrete time points, D = {d(t_{1}), ..., d(t_{N})}, includes the signal of interest, s(t_{i}), a possible background function, g(t_{i}), and the noise present in the system, e(t_{i}),
The signal function will usually be unknown and may be complicated, but can be approximated by a linear combination of m_{s} model functions, ψ_{i}, that we parameterise by the quantity of interest, ω:
in which are the expansion coefficients.
Similarly, the background function, g(t_{i}), can be approximated by a set of m_{g }functions, ζ_{i}, that are independent of ω,
where are the background model function expansion coefficients.
Since a and b are not the main focus of the analysis, we will aim to integrate them out of the equations by marginalisation. Parameters that are treated in this manner as often referred to as nuisance parameters, which we denote here by . Although the signal function depends on ω, whereas the background function does not, for notational purposes we introduce the set of model functions, ϕ_{i}, which consists of both ψ_{i }and ζ_{i}. This allows us to condense the model equation into
such that now d(t_{i}) = f(t_{i}) + e(t_{i}) and m is now the total number of model functions, m_{s }+ m_{g}. The model functions will typically not be orthogonal functions over the time series domain. This, however, can be achieved by Cholesky decomposition. In all subsequent calculations an orthogonal basis is used. From Bayes' rule, the joint probability distribution of the model for the parameters ω and c is
The likelihood function, P(Dω, c, H, I), is calculated by comparing data produced by the model signal, equation (5), to real experimental data. If the model perfectly captures the signal, the difference between the model data and the real data is simply the noise in the system. The model of the data in equation (2) includes noise, e(t_{i}), which we assume to be time independent in the further developments. The true noise level is unknown, but for a given noise power, σ^{2}, the principle of maximum entropy leads to the use of a normal distribution,
A noise model of this form ensures that the accuracy of the results is maximally conservative for a given noise power. We will later integrate over all possible noise levels to remove the dependence on σ. With the described signal and this noise model, the likelihood was calculated by Bretthorst [15] to be
where N is the number of data points, and
where is the meansquare of the data, , and Φ_{jk }is the matrix of the model functions, .
The goal of the analysis is to compute the posterior probability for frequencies in the data, i.e. to go from the joint probability distribution to a posterior probability of ω, independent of the other parameters. By integrating over all possible values of the parameters σ and c, the remainder is the marginal posterior of the parameters of interest, ω={ω_{1}, ω_{2}, ..., ω_{r}}. This is an essential advantage of the Bayesian framework, allowing the analysis to focus on estimating the parameters of interest, regardless of the values of the others. If necessary, the other parameters can be estimated at a later point.
To integrate over the σ and c values, priors must first be assigned to them. We chose uniform priors for c and ω, representing complete lack of knowledge. We know that σ is continuous and must be positive, and in such cases a Jeffreys prior is appropriate, P(σI) = 1/σ. Both the uniform distribution over continuous variables and the Jeffreys prior are known as improper priors if bounds are not specified as they cannot be normalised. For more information on prior assignment see [3,15].
Using the general model, equation (5), assigning the priors, calculating the likelihood function, equation (8), and integrating out the amplitudes and noise parameters, the posterior probability distribution of ω is proportional to
where h is the projection of the data onto the orthonormal model functions, , and is the meansquare of the h_{j}, , [15]. This expression of the posterior allows us to identify the strongest frequencies present in the data. For a good model, there will be a high probability peak in the posterior distribution at that ω = {ω_{1}, ω_{2}, ..., ω_{r}}.
Results and Discussion
We employed the framework developed by Jaynes [3] and Bretthorst [15] to investigate the frequency components in a number of biological time series.
Model comparison
After evaluating the probability of parameters in light of a certain hypothesis, it is important to question the validity of that hypothesis. Thus, the next step in Bayesian inference is to compare the probability of different hypotheses. The hypothesis is now a particular model of the signal, H_{i}, out of a set of M models {H_{1}, ..., H_{M}}, and using Bayes' Rule, the posterior probability of this model is
Then two different models, H_{i }and H_{j}, can be compared by taking their ratios,
The probability of the data given our prior information, P(DH_{i}, I), which was a normalisation constant in equation (6), will now vary between models, and is called the evidence. It evaluates the fit of the data to the model, whilst penalising models that include more parameters. Each additional model parameter should be followed by a significant increase in probability, otherwise the simpler model is preferred. Thus, Bayesian model comparison naturally follows the principle of Occam's razor [33,34].
Model development
It will often not be obvious which function to choose to model trends in the data, so an approach using basis functions and expanding these to different orders will be of advantage, as in equation (4). Each expansion represents a different model, H_{i}, and these can be compared using inference techniques. Likewise, different functions for capturing the signals in the data and modelling a different number of signals correspond to different models for data. Following [3,14,33], we use the posterior ratio to evaluate different models.
This model ratio can be used to determine the number of background model functions for each time series. The posterior probability ratio is calculated between model H_{n }and H_{n+1}, where H_{n }is a model including n background functions. To obtain the model ratio, priors are assigned to each of the models and their likelihood functions are calculated. Assigning equal prior probability to all models reduces this to the ratio of evidences. To compute the evidences we need to integrate the likelihood, P(Dω, σ, c, H_{n}, I) from equation (8), over ω, σ, and c for each model H_{n}. By assigning proper normalised priors to all model parameters it is possible to integrate over them around the maximum likelihood estimate. Following Bretthorst's derivation for location parameters [15], we assign Gaussian priors to the amplitudes with hyperparameters for the variances. Since the variances are scale parameters, they are subsequently assigned Jeffreys' priors with an upper and lower bound. This allows us to normalise them and integrate, leaving the defined bounds as parameters in the final equation. For models with the same bounds these terms cancel out in the model ratio. The evidence for a given model, H_{n}, was calculated by Bretthorst [15] to be
where δ, γ and σ are the prior variances for amplitudes, frequencies and noise, respectively, R_{δ}, R_{γ }and R_{σ }are the ratios of the integral bounds for these variances, is the meansquare projection of the data onto the orthonormal model functions at the maximum likelihood point for model H_{n}, is the meansquare of the ω value that maximises the likelihood, , and r is the number of ω parameters, ω = {ω_{1}, ..., ω_{r}}. The Jacobian is obtained by orthogonalising the Taylorexpansion of around the maximumlikelihood point, . See Bretthorst for further details [15]. For cases in which the number of frequencies in the data exceeds the dimension of omega, for instance multiple frequency data with a single frequency model, the above approximation for the evidence is illsuited as the posterior will cease to be unimodal. For such scenarios, either multiple expansions or MCMC offer attractive solutions to marginalisation. For comparison we have included results from Nested Sampling [14,35] as a means to perform the integration and compute the evidence. Nested Sampling is a variant of MCMC that employs a likelihood based sorting of sample points to efficiently guide the search strategy of the posterior distribution [14,35].
When the model ratio, H_{n}/H_{n}_{+}_{1}, becomes greater than 1, the simpler model, H_{n}, is favoured over H_{n+1 }[33]. Adding more background functions than are justified by the data (based on the posterior model ratio) may lead to a lower probability for the frequency and in some cases possibly a location shift.
This model development approach used for the background functions above can also be used to decide on the number of underlying frequencies in the data. The model ratios of a time series containing one frequency (case A) and a time series containing two (case B) are presented in Table 1, analysed with both a one and twofrequency model. The results show, as expected, a preference for the onefrequency model in case A, and for the twofrequency model in case B.
Table 1. Model ratios for number of frequencies in the data
We point out that the proposed method stops once the current best model has been found but is not guaranteed to find the global maximum from a predefined set of models. The procedure is thus part of model development rather than model selection. If the set of hypotheses are known in advance then the posterior ratios over the full set should be used to find the best model.
Testing
We first show the power of the BSA approach on test cases using simulated data. In these tests, we sought to recover known input parameters from the simulated data, to validate the BSA approach. We employed sines and cosines as model functions (ψ_{j }in equation (3)). For comparison, Discrete Fourier Transforms were computed using Fast Fourier Transforms (FFT) [36]. In the test cases, we varied key parameters such as noise levels, trace length, sampling intervals, amplitudes, frequencies, background trends and shape of oscillations.
Representative cases of noise levels and background trends are shown in Table 2, including FFT results on the same data set. A key observation is that the Bayesian approach extracts the correct answer from the data with high precision. BSA also computes the signaltonoise ratios which is a useful indication of how much of the data cannot be accounted for in the model. Furthermore, the amplitudes do not impact the BSA results since they are integrated out.
Table 2. BSA and FFT results from simulated harmonic data with noise and background trends
BSA has a clear advantage over FFT when the data is nonuniformly sampled. FFT requires uniform sampling, whilst BSA is less stringent and delivers the correct result with higher precision. Bretthorst also noted that nonuniformly sampled data removes aliases from the frequency domain, another significant advantage [15]. Five further distinct cases emerged from the tests in which BSA delivers superior results to FFT: time series which have background trends, few data points, high noise levels, multiple frequencies, and nonharmonic oscillations.
Background trends
Additional file 1, Figure S1, is an example of a time series with a strong background trend. In Table 3, the model ratios for different background functions are shown. The ratio is initially well below 1, but the ratio of models with expansion orders of two and three Legendre polynomials is above 1. Thus, background functions of Legendre polynomials to expansion order two is more likely, and should be used in the estimation of ω.
Additional File 1. Time series with background trend. Time series including a background trend, simulated from d(t) = sin(ωt)  0.005t^{2 }+ e, with ω = 0.5 rad/s, sampled with 1 s intervals to give 200 points. The noise level, e, is 0.1 which corresponds to 10%.
Format: PDF Size: 9KB Download file
This file can be viewed with: Adobe Acrobat Reader
Table 3. Automated model development
Examples 810 in Table 2 also include trends, and without preprocessing FFT cannot pick out the correct frequency. In contrast, BSA includes background functions in the model signal and delivers the desired result. Including background functions, however, results in overestimation of the signaltonoise ratio.
Short time series
Additional file 2, Figure S2, shows the results from analysing a short time series. The FFT power spectrum is very broad (Additional file 2, Figure S2B), which comes as no surprise given the FFT dependence on the number of data points. BSA estimates the correct frequency sharply, but the maximum probability drops compared to longer time series (Additional file 2, Figure S2C). This demonstrates the higher uncertainty associated with fewer time points.
Additional File 2. Short time series. A: A short time series simulated from d(t) = sin(ωt) + e, with ω = 0.5 rad/s, and sampled with 1 s intervals to give 20 points. The noise level, e, is 0.1 which corresponds to 10%. B: FFT results. The yaxis is the spectral power, S. C: BSA result. P denotes the posterior probability. The BSA estimate of ω is correct, and has considerably less spread than the FFT estimate.
Format: PDF Size: 16KB Download file
This file can be viewed with: Adobe Acrobat Reader
High noise levels
BSA is also successful at handling high levels of noise, as highlighted in Examples 16 in Table 2. The frequency estimates are correctly reproduced by the FFT. In these simple test scenarios, the BSA posterior probability distribution estimates the frequency with a significantly higher precision than the FFT. Whereas the estimated uncertainty of parameter expectation values is a builtin aspect of any probabilistic treatment such as BSA, FFT has no inherent mechanism for assessing the accuracy of the results. The FFT output is summarised by the average, , and the standard deviation, σ_{FFT}, over the transformed data set. We show that different noise levels influence the σ_{FFT }more than the σ_{BSA }(Figure 1).
Figure 1. Effects of noise on precision. The effect of noise on σ for FFT (A) and BSA (B). The time series were simulated from d(t) = sin(ωt) + e, with ω = 0.5 rad/s and sampled with 3 s intervals to give 100 points. The noise level, e, varies between 0 and 100%. Although the qualitative behaviour of noise on the precision is the same for FFT and BSA in this example, BSA produces values of σ that are two orders of magnitude below the FFT values.
Multiple frequencies
Example 7 in Table 2 has two frequencies present in the data. Both BSA and FFT show these two peaks in the resulting plots. Although BSA can be used in this manner with a onedimensional ω to scan through frequency space and estimate the number of frequencies in the data and their location, if more than one frequency is present, the model should be extended to reflect this. Without this extension the integration procedure around a single point is not well suited, so we employed Nested Sampling to compute the marginalisation in these cases. For the extension approach, when the posterior probability over ω = {ω_{1}} reveals two strong frequencies, then a better model would be ω = {ω_{1}, ω_{2}}. For example, Additional file 3, Figure S3, shows BSA and FFT results for a test case that includes higher harmonics which give rise to multiple peaks in the log P plot. If more than one peak in the resulting posterior probability emerges, then the model can be extended further. One peak in the posterior probability over the number of modelled frequencies signifies that the correct number of frequencies has been captured.
Additional File 3. Higher harmonics. A: Time series with higher harmonic frequencies, simulated from d(t) = sin(ωt) + sin(3ωt) + sin(5ωt), with ω = 0.1 rad/s, sampled with 1 s intervals to give 200 points. B: FFT results. D: log(Probability) plot of the BSA results. Both BSA and FFT show three strong peaks in ω. Depending on the length of the series, the truncation, and sampling interval not all peaks will result in an equal probability. C: Probability plot of the BSA results. In the current case, the question of which single frequency is the most probable results in the selection of the 3ω frequency.
Format: PDF Size: 19KB Download file
This file can be viewed with: Adobe Acrobat Reader
As another example, Additional file 4, Figure S4, shows the result of a twofrequency search. The BSA posterior probability distribution is now twodimensional with a peak at the two correct frequencies (Additional file 4, Figure S4C). The FFT results are also shown (Additional file 4, Figure S4B).
Additional File 4. Multiple frequencies. A: A time series containing two distinct frequencies, simulated from d(t) = cos(ω_{1}t) + cos(ω_{2}t), with ω_{1 }= 0.3 rad/s and ω_{2 }= 0.5 rad/s, sampled with 1 s intervals to give 250 points. B: FFT results. The yaxis shows the spectral power, S. C: BSA result. Each point in this plot has two frequencies, so only offdiagonal elements correspond to two distinct frequencies and only if both are present in the data will a high joint probability emerge. Both approaches detect the correct frequencies.
Format: PDF Size: 18KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5, Figure S5A, shows a time series with a high noise level and two very close frequencies of 0.498 and 0.505 rad/s. FFT cannot distinguish them and shows only one peak (Additional file 5, Figure S5B). BSA breaks the resolution and precision limitations inherent to FFT by introducing a continuous probability distribution instead of the fixed number of points and can therefore sample the posterior more finely in areas of high probability. This approach gives rise to a highresolution probability plot in which two distinct frequencies emerge (Additional file 5, Figure S5D). The peaks have a larger variance at this local level, but the qualitative information of two underlying frequencies is revealed.
Additional File 5. Multiple close frequencies with noise. A: A time series containing two close frequencies, ω_{1 }= 0.498 rad/s and ω_{2 }= 0.505 rad/s, simulated from d(t) = cos(ω_{1}t) + cos(ω_{2}t) + e, and sampled with 1 s intervals. The noise level, e, is 0.1, which corresponds to 10%. B: The FFT result only shows one peak due to the sampling resolution that is determined by the time domain data. In the yaxis, S stands for spectral power. C: The BSA estimate using a two frequency model. Each point in this plot has two frequencies, so only offdiagonal elements correspond to two distinct frequencies and only if both are present will a high joint probability emerge. D: Sampling in the area around the peak of high probability show that two distinct frequencies emerge in strong offdiagonal peaks.
Format: PDF Size: 29KB Download file
This file can be viewed with: Adobe Acrobat Reader
To develop BSA further, we used windowing of the time series to compute the posterior probability distribution of ω at each time point. We call this BSA Local (BSAL). The robustness and negligible peak broadening of BSA with fewer time points allows for this windowing to proceed without the introduction of artefacts due to truncation. This local BSA captures changes in frequency, as shown in Figure 2B and Figure 3B. The BSAL was compared to ShortTime Fourier Transform (STFT)(Figure 2D and Figure 3D), which is a windowed Fourier Transform, and to wavelets (Figure 2C and Figure 3C). For the wavelet power spectrum a Morlet mother wavelet was used [37]. The advantages of BSAL are that it remains within the same BSA framework, has high accuracy, and does not require preprocessing of the data.
Figure 2. Frequency changing over time. A: A time series with a changing frequency, simulated from with ω_{1 }= 0.25 rad/s. B: The posterior probability distribution for the estimated frequency using a local BSA, BSAL, with a window size of 20. The distribution is so narrow that it resembles a sharp line in the plot. C: The wavelet power spectrum using a Morlet mother wavelet [37], with a lower cutoff at ω = 0.1 rad/s in the spectrum. The wavelet reproduces the correct result but with a broad distribution of frequencies. D: The STFT result with a window size of 50, and an overlap of 40, is similar to the wavelet results.
Figure 3. One sharp frequency change. A: A time series with a sharp change in frequency half way through the observed time frame, simulated from d(t) = sin(ωt) with ω_{1 }= 0.2 rad/s and ω_{2 }= 0.4 rad/s. B: The posterior probability distribution for the estimated frequency using a local BSA, BSAL, with a window size of 10, which is one third of the number of data points in a period of the first frequency. Nevertheless, the resulting distribution is so narrow that it resembles a sharp line. C: The wavelet power spectrum using a Morlet mother wavelet [37], with a lower cutoff at ω = 0.1 rad/s in the spectrum. D: The STFT power spectrum, window size of 100 and overlap of 5. Both the wavelet and STFT results gives the correct answer but with high variance.
Nonharmonic oscillations
BSA results for oscillations with a nonharmonic shape are superior to the FFT. It highlights an essential difference in the two methods since biological data is often repetitive, but with a wide range of oscillatory patterns. To demonstrate this further, Figure 4A shows a time series simulated from an ordinary differential equation (ODE) model of cellular calcium (Ca^{2+}) signals [38]. Such time series presents two potential problems: the time series is chaotic and thus not perfectly periodic, and the signal shape is nonharmonic. The calculation of interspike intervals (ISI) of the time series show that multiple intervals are present (Figure 4D). The highest peak of the FFT plot (Figure 4B) suggests that the entire time series is one period, while BSA suggests a strong angular frequency around 1.2 rad/s (Figure 4C). The BSA suggestion is similar to the second FFT peak.
Figure 4. Nearperiodic oscillations. A: A nearperiodic time series simulated from a set of ODEs describing Ca^{2+ }oscillations in animal cells [38]. In the yaxes, n stands for the number of interspike intervals (ISI) in the ISI plot, P for probability in the BSA plot and S for spectral power in the FFT plot. The ISI plot show several intervals present (D), but both FFT (B) and BSA (C) identify a frequency around ω = 1.2 rad/s.
This highlights the differences between frequencies in the data and spike intervals. ISI are a common way of characterizing spike data, however, multiple ISI need not correspond to multiple frequencies in the data. Of the four strong ISI shown here, both BSA and FFT identify only one of these as a regular period.
Summary
After extensive test cases we find that BSA delivers superior results in cases where the FFT assumptions are too constraining, most notably in the five cases above. BSA is a flexible method allowing the underlying hypothesis to be changed depending on the focus of the analysis, and to directly compare the validity of different hypotheses. It can handle nonuniformly sampled data and has no need for preprocessing procedures. The price of these superior results comes at a computational cost that ranged from tens to hundreds of seconds for the examples shown here.
Calcium spiking data
The first biological data set comes from intracellular signalling in plantmicrobe interactions. Symbiotic bacteria induce calcium oscillations, called Ca^{2+ }spiking, in legume root cells (for a review, see [39]). These are nonstationary and often noisy time series, causing problems in identifying periodicity. One hypothesis for signal transduction in this system is via frequency encoding [40], so concluding whether there is underlying periodicity, and at what frequency, is of great interest.
The Ca^{2+ }spiking has background trends present due to fluorescence bleaching and cell movements, which are assumed to be unrelated to the underlying signal in the cell. Therefore, accounting for the background functions plays a key role in the analysis. Example time series are shown in Figure 5A. Nine spiking cells from the model legume Medicago truncatula were analysed for an underlying period. The data is obtained by microinjecting a root hair cell with the calcium indicator dyes Oregon Green (responds to Ca^{2+}) and Texas Red (nonresponsive), and exposing the plant to the bacterial signal molecule that induces the oscillations. The data is a ratio of the fluorescence from the two dyes, showing changes in Ca^{2+ }concentrations. The data has been published in [41].
Figure 5. Example results of data from calcium spiking. A: Time series of Ca^{2+ }oscillations measured in a M. truncatula root hair cell in response to a bacterial signal molecule. The data is a relative ratio of the fluorescence of the Ca^{2+ }indicator dyes Oregon Green and Texas Red, showing changes in concentration, [Ca^{2+}]. The measurements are taken with 5 s intervals. BSA identifies one strong frequency at a period, T, of about 100 s (C), while MTM (B) and FFT (D) deliver a broad spectrum.
The FFT of the Ca^{2+ }data results in a very broad periodogram, due to multiple frequencies and high noise levels (Figure 5D). Also, the spiking produces a nonharmonic signal, which might be another problem for the FFT. For comparison, we also present results from the multitaper method (MTM). The MTM is a nonparametric method of spectral analysis that uses tapers to minimize the variance in the power estimate, and is targeted at short and noisy time series [42]. The MTM results were very similar to the FFT (Figure 5B). These periodograms do not address the question of interest: Is there a key period in the Ca^{2+ }signal? In the BSA analysis (Figure 5C), the Ca^{2+ }spiking data used the Legendre background functions to an expansion order of 12, depending on the individual trace. Nested Sampling was used to compute the evidences. Frequencies with high probabilities were picked out, but varied in the interval of approximately 50120 s (Table 4). However, the strongest periods were in the interval of 75100 s. If periodicity plays a role in the signal transduction of this system, then the key period should be in this interval. The signaltonoise ratios were relatively high, between 100200, possibly as a consequence of including several background functions.
Table 4. BSA on calcium data
Circadian data
The second biological data set shows gene expression of socalled clock genes. Many processes in plants follow a circadian rhythm (for reviews see e.g. [43] or [44]). A number of genes in Arabidopsis thaliana have been shown to regulate circadian rhythms, and time series of RNA levels show how these clock genes are expressed in cycles [45]. Time series with only a couple of cycles are common in biology and provide another suitable test case.
For these circadian rhythms, we chose to analyse RTPCR data from four clock genes in two genotypes of A. thaliana. The plants are either wild type, FRI;FLC, or mutants, fri;flc, of the genes FRI and FLC. The RNA was extracted from seedlings, and each time series is an average of two biological replicates. An example of the RNA levels of a clock gene is shown in Figure 6A. The data has been published in [45]. FFT on the RNA levels of these clock genes did not give any clear periods, either having only a vague peak or none at all (Figure 6D). This is caused by the FFT's dependence on the length of the time series, which in this case was only 12 cycles. The MTM method had more of a peak in the 2025 h period, but still lacking in precision (Figure 6B). BSA on the other hand provides a clear peak close to 23 h (Figure 6C), consistently for all eight time series (Table 5). Nested Sampling was used to compute the evidences. The assigned probabilities are relatively low, but the signaltonoise ratios were between 24, and similar probabilities were obtained using simulated data with few data points and high noise levels. The period peaks are very stable over all the time series, and suggests a probable period which is unaffected by the mutations (fri;flc). This is in agreement with the original conclusions of the experiment [45].
Figure 6. Example results of data from a clock gene's RNA levels. A: RNA levels of a clock gene, TOC1, in a fri;flc background of A. thaliana seedlings. The data points have 2 h intervals and consist of two averaged biological replicates. The values are normalised to the average of the housekeeping gene ACTIN2 in wild type. BSA gives a clear result with a period, T, close to 23 h (C), while MTM (B) and FFT (D) struggle with the short time series and give no clear result.
Table 5. BSA on circadian data
Conclusions
Bayesian inference offers a powerful way of analysing biological time series. Despite the undisputed value of Fourier theory, there are cases when the necessary requirements for its optimality for time series analysis are not met. This is a consequence of the underlying assumptions of a Fourier Transform, causing it to work optimally only for uniformly sampled, long, stationary, harmonic signals that have either no or white noise. In biology these requirements are rarely fulfilled, requiring preprocessing of the data, such as noise reduction and detrending techniques, with the risk of convoluting the signal and losing valuable information.
By placing the problem of frequency extraction in the framework of Bayesian inference, the known and welldocumented problems of Fourier analysis can be overcome. This approach also breaks the resolution and precision limitations inherent to the FFT by introducing a continuous probability distribution instead of the fixed number of points maintained by the discrete Fourier Transform. As we demonstrated here, BSA coupled with automated model development can give superior results to the FFT when faced with short, noisy time series, nonstationarity and nonharmonic signals. The suggested automated model development worked well in our hands but must be used with caution in practice as the approach is not guaranteed to find a global optimum in model space. Alternate models should be explored and compared using posterior probability ratios or approximations thereof. We found Nested Sampling [14] to provide a powerful means of estimating evidences for cases in which a single peak could not be identified. Other MCMC techniques such as simulated annealing running in parameter exploration mode or standard MetropolisHastings algorithms offer attractive alternatives [33].
BSA calculates signaltonoise ratios, provides parameter precision estimates, and can handle high noise levels as well as background trends and therefore has no need for preprocessing. More importantly, the Bayesian framework offers flexibility in the underlying model and enables direct comparison of hypotheses. The work presented here is a merely a first step in this direction. We have employed conservative priors (uniform, Jeffreys, Gaussian) that make an analytical treatment tractable but in some cases more information could warrant a different choice of prior that might require substantial alternations to our approach to handle the numerics of marginalisation.
There are many known examples in biology in which oscillations play a key role and methods for their detection will be of value, especially in cases where subtle differences are of importance and for short, noisy time series. In the presented examples, we demonstrated the improvements that can be gained from employing this approach. Although in these cases, the biological conclusions would not have changed, one can envision scenarios in which a higher accuracy in frequency detection may allow subtle changes to be detected, which may otherwise have been swamped by noise and less powerful techniques. We believe that the presented methodology offers an attractive alternative to other approaches and will be a useful addition to the toolbox of systems biologists.
Methods
All programming was done using Octave [46], which is freely available and compatible with the widely used MATLAB^{®}. The Octave code is freely available from the authors upon request.
FT
The DFT was computed using the fft function in Octave. The results are presented in a power spectrum, to analyse which component carries the most power. This is also know as a periodogram , i.e. the squared absolute value of the FFT of the data d, normalised over the number of data points N [2]. For time series with strong trends, detrending was done before the FFT, using the moving average method [47].
There are a number of sophisticated FT methods beyond the standard FFT, developed to avoid specific problems. For example, we also present results from the multitaper method (MTM), shorttime Fourier Transforms (STFT) and wavelet analysis. For the MTM, only the MTM spectrum is presented, but it should be noted that the Singular Spectrum Analysis  MultiTaper Method (SSAMTM) toolkit provides additional features such as significance levels of the frequencies, relative to the estimated noise levels [42]. The STFT power spectrums were computed with the specgram function in Octave. The wavelet results were computed using software provided by Dr. C. Torrence and Dr. G. Compo, and is available online http://atoc.colorado.edu/research/wavelets/ webcite. This wavelet software also provides additional tools such as significance levels [37].
BSA
A flowchart of the BSA code is shown in Figure 7. The first step is to specify appropriate model and background functions. We employed sines and cosines as model functions (ψ_{j }in equation (3)), and Legendre polynomials as background functions (ζ_{j }in equation (4)). Legendre functions are convenient as they form a basis that can be scaled to be orthogonal over the time domain and offer a level of detail that increases with expansion order. The software, however, will attempt to orthogonalise any given set of functions over the range determined by the data by Cholesky decomposition, so other functions can be employed.
Figure 7. BSA and automated background function determination. Flowchart of the automated model development procedure for BSA. We point out that the proposed method for detecting the best number of background functions may give rise to local rather than global solutions for complex background trends and/or poor choices of background basis functions.
The next step is to specify the frequency domain of interest. This domain is then sampled with a chosen interval, and the posterior probability is computed at each frequency. Since the ω values are sampled over the frequency domain of interest with a chosen interval, the most probable frequency from this set may have a close neighbour with even higher probability, but which fell between sampling points. To avoid this, the NelderMead optimisation technique was used to find the maximum of equation (10) [48]. Subsequently, the area surrounding this peak is finely sampled, to achieve a better representation of the posterior probability distribution of ω. The number of maxima should be checked and if proceeding with a multimodal distribution, an MCMC technique such as Nested Sampling should be used instead of the described marginalisation method. The outputs from the BSA algorithm are the posterior probability distribution of ω, a signaltonoise ratio distribution and a power spectrum.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
RJM and GEDO conceived the experiments. RJM and EG developed and implemented the code. EG performed all the tests and analyses. EG and RJM wrote the paper. All authors read and approved the final manuscript.
Acknowledgements
We thank Dr David Richards and Nick Pullen for critical reading of the manuscript and useful suggestions. We thank Dr Jongho Sun for the provision of calcium spiking data. Thanks are also due to four anonymous reviewers for their detailed, constructive criticism and insightful comments. We would like to thank the Free Software Foundation and all authors of software packages who generously make their tools freely available (LATEX, gnuplot, emacs, Octave, gcc, and many many others). EG acknowledges PhD funding from the John Innes Foundation. RJM and GEDO are grateful for support from the BBSRC.
References

Goldbeter A: Biochemical Oscillations and Cellular Rhythms: The Molecular Bases of Periodic and Chaotic Behaviour. Cambridge University Press; 1997.

Bracewell RN: The Fourier transform and its applications. 2nd edition. New York: McGrawHill; 1978.

Jaynes ET, Bretthorst GL: Probability theory: the logic of science. Cambridge, UK: Cambridge University Press; 2003.

Kammler DW: A First Course in Fourier Analysis. 2nd edition. Cambridge: Cambridge University Press; 2007.

Mallat SG: [http://www.loc.gov/catdir/toc/els033/99065087.html] webcite
A wavelet tour of signal processing. 2nd edition. San Diego: Academic Press; 1999.

Heil C, Walnut DF: Fundamental papers in wavelet theory. Princeton, N.J.: Princeton University Press; 2006.

Daubechies I: [http://www.loc.gov/catdir/enhancements/fy0664/92013201d.html] webcite
Ten lectures on wavelets. Philadelphia, Pa.: Society for Industrial and Applied Mathematics; 1992., 61

Prasad L, Iyengar SS: [http://www.loc.gov/catdir/enhancements/fy0744/97011042d.html] webcite
Wavelet analysis with applications to image processing. Boca Raton: CRC Press; 1997.

Petrosian AA, Meyer FG: [http://www.loc.gov/catdir/toc/fy02/2001045957.html] webcite
Wavelets in signal and image analysis: from theory to practice. Dordrecht: Kluwer Academic; 2001., 19

Davis RA, Charlton AJ, Godward J, Jones SA, Harrison M, Wilson JC: Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. [http:/ / www.sciencedirect.com/ science/ article/ B6TFP4M644Y71/ 2/ d43b96507a5f622e179566a30b025732] webcite
Chemometrics and Intelligent Laboratory Systems 2007, 85:144154. Publisher Full Text

Ricke J, Maass P, Lopez Hänninen E, Liebig T, Amthauer H, Stroszczynski C, Schauer W, Boskamp T, Wolf M: Wavelet versus JPEG (Joint Photographic Expert Group) and fractal compression. Impact on the detection of lowcontrast details in computed radiographs.
Invest Radiol 1998, 33(8):45663. PubMed Abstract  Publisher Full Text

Lucier BJ, Kallergi M, Qian W, DeVore RA, Clark RA, Saff EB, Clarke LP: Wavelet compression and segmentation of digital mammograms.
J Digit Imaging 1994, 7:2738. PubMed Abstract  Publisher Full Text

Sivia DS, Skilling J: Data analysis: a Bayesian tutorial. 2nd edition. Oxford: Oxford University Press; 2006.

Bretthorst GL: Bayesian spectrum analysis and parameter estimation. Lecture notes in statistics, New York: SpringerVerlag; 1988.

Bretthorst GL, Kotyk JJ, Ackerman JJ: 31P NMR Bayesian spectral analysis of rat brain in vivo.
Magn Reson Med 1989, 9(2):2827. PubMed Abstract  Publisher Full Text

Neil JJ, Bretthorst GL: On the use of Bayesian probability theory for analysis of exponential decay data: an example taken from intravoxel incoherent motion experiments.
Magn Reson Med 1993, 29(5):6427. PubMed Abstract  Publisher Full Text

Bois FY: GNU MCSim: Bayesian statistical inference for SBMLcoded systems biology models.
Bioinformatics 2009, 25(11):14531454. PubMed Abstract  Publisher Full Text

Kim S, Imoto S, Miyano S: Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data.
Biosystems 2004, 75(13):5765. PubMed Abstract  Publisher Full Text

Klinke DJn: An empirical Bayesian approach for modelbased inference of cellular signaling networks.
BMC Bioinformatics 2009, 10:371. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Morrissey ER, Juarez MA, Denby KJ, Burroughs NJ: On reverse engineering of gene interaction networks using time course data with repeated measurements.
Bioinformatics 2010, 26(18):23052312. PubMed Abstract  Publisher Full Text

Mukherjee S, Speed TP: Network inference using informative priors.
Proc Natl Acad Sci USA 2008, 105(38):1431314318. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Savage RS, Ghahramani Z, Griffin JE, de la Cruz BJ, Wild DL: Discovering transcriptional modules by Bayesian data integration.
Bioinformatics 2010, 26(12):i15867. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Schaber J, Liebermeister W, Klipp E: Nested uncertainties in biochemical models.
IET Syst Biol 2009, 3:19. PubMed Abstract  Publisher Full Text

Secrier M, Toni T, Stumpf MPH: The ABC of reverse engineering biological signalling systems.
Mol Biosyst 2009, 5(12):19251935. PubMed Abstract  Publisher Full Text

Vyshemirsky V, Girolami MA: Bayesian ranking of biochemical system models.
Bioinformatics 2008, 24(6):833839. PubMed Abstract  Publisher Full Text

Vyshemirsky V, Girolami M: BioBayes: a software package for Bayesian inference in systems biology.
Bioinformatics 2008, 24(17):19331934. PubMed Abstract  Publisher Full Text

Wilkinson DJ: Bayesian methods in bioinformatics and computational systems biology.
Brief Bioinform 2007, 8(2):109116. PubMed Abstract  Publisher Full Text

Yoshida R, Nagasaki M, Yamaguchi R, Imoto S, Miyano S, Higuchi T: Bayesian learning of biological pathways on genomic data assimilation.
Bioinformatics 2008, 24(22):25922601. PubMed Abstract  Publisher Full Text

Yoshida R, Saito MM, Nagao H, Higuchi T: Bayesian experts in exploring reaction kinetics of transcription circuits.
Bioinformatics 2010, 26(18):i58995. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meier JE, Marshall AG: Bayesian versus Fourier spectral analysis of ion cyclotron resonance timedomain signals.
Anal Chem 1990, 62(2):2018. PubMed Abstract  Publisher Full Text

Chylla RA, Markley JL: Improved frequency resolution in multidimensional constanttime experiments by multidimensional Bayesian analysis.
J Biomol NMR 1993, 3(5):51533. PubMed Abstract

MacKay DJC: Information theory, inference, and learning algorithms. Cambridge, UK: Cambridge University Press; 2003.

Toni T, Stumpf MPH: Simulationbased model selection for dynamical systems in systems and population biology.
Bioinformatics 2010, 26:104110. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Skilling J: Nested Sampling for General Bayesian Computation.

Cooley J, Tukey J: An Algorithm for Machine Calculation of Complex Fourier Series.
Mathematics of Computation 1965, 19(90):297301. Publisher Full Text

Torrence C, Compo G: A Practical Guide to Wavelet Analysis.
Bulletin of the American Meteorological Society 1998, 79:6178. Publisher Full Text

Kummer U, Olsen LF, Dixon CJ, Green AK, BornbergBauer E, Baier G: Switching from simple to complex oscillations in calcium signaling.
Biophys J 2000, 79(3):118895. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Oldroyd GED, Downie JA: Coordinating nodule morphogenesis with rhizobial infection in legumes.
Annu Rev Plant Biol 2008, 59:51946. PubMed Abstract  Publisher Full Text

Hazledine S, Sun J, Wysham D, Downie JA, Oldroyd GED, Morris RJ: Nonlinear time series analysis of nodulation factor induced calcium oscillations: evidence for deterministic chaos?
PLoS One 2009, 4(8):e6637e6637. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kosuta S, Hazledine S, Sun J, Miwa H, Morris RJ, Downie JA, Oldroyd GED: Differential and chaotic calcium signatures in the symbiosis signaling pathway of legumes.
Proc Natl Acad Sci USA 2008, 105(28):98239828. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi F, Yiou P: Advanced spectral methods for climatic time series.

PrunedaPaz JL, Kay SA: An expanding universe of circadian networks in higher plants.
Trends Plant Sci 2010, 15(5):25965. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gardner MJ, Hubbard KE, Hotta CT, Dodd AN, Webb AAR: How plants tell the time.
Biochem J 2006, 397:1524. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Edwards KD, Anderson PE, Hall A, Salathia NS, Locke JCW, Lynn JR, Straume M, Smith JQ, Millar AJ: FLOWERING LOCUS C mediates natural variation in the hightemperature response of the Arabidopsis circadian clock.
Plant Cell 2006, 18(3):63950. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Brockwell PJ, Davis RA: Introduction to time series and forecasting. 2nd edition. New York: Springer; 2002.

Press WH: Numerical recipes in C++: the art of scientific computing. 2nd edition. Cambridge, UK: Cambridge University Press; 2002.