This article is part of the supplement: Probabilistic Modeling and Machine Learning in Structural and Systems Biology

Open Access Open Badges Research

A novel Bayesian approach to quantify clinical variables and to determine their spectroscopic counterparts in 1H NMR metabonomic data

Aki Vehtari1*, Ville-Petteri Mäkinen1, Pasi Soininen2, Petri Ingman3, Sanna M Mäkelä4, Markku J Savolainen4, Minna L Hannuksela4, Kimmo Kaski1 and Mika Ala-Korpela1*

Author Affiliations

1 Laboratory of Computational Engineering, Systems Biology and Bioinformation Technology, Helsinki University of Technology, P.O. Box 9203, FI-02015 HUT, Finland

2 Department of Chemistry, University of Kuopio, P.O. Box 1627, FI-70211 Kuopio, Finland

3 Department of Chemistry, Instrument Centre, Vatselankatu 2, FI-20014 University of Turku, Turku, Finland

4 Department of Internal Medicine, Clinical Research Center, University of Oulu, P.O. Box 5000, FI-90014 Oulu, Finland

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 2):S8  doi:10.1186/1471-2105-8-S2-S8

Published: 3 May 2007



A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1H NMR spectroscopy of serum.


A Bayesian methodology, with a biochemical motivation, is presented for a real 1H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1H NMR spectra.


The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.