Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

Open Access Methodology article

Statistical Inference Methods for Sparse Biological Time Series Data

Juliet Ndukum1, Luís L Fonseca23, Helena Santos2, Eberhard O Voit3 and Susmita Datta1*

Author Affiliations

1 Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA

2 Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Rua da Quinta Grande 6, 2780-156 Oeiras, Portugal

3 Integrative BioSystems Institute and Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA

For all author emails, please log on.

BMC Systems Biology 2011, 5:57  doi:10.1186/1752-0509-5-57

Published: 25 April 2011

Abstract

Background

Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR) spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles.

Results

The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values <0.0001).

Conclusion

We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures, based on ANOVA likelihood ratio tests, for testing the significance of differences between short time course data under different biological perturbations.