Table 1

Statistics of multivariate analysis models demonstrating an association between 1H NMR spectroscopic data and several biological and lifestyle factors.

Model Description

Y variable

Number of LVs

R2X

Q2Y

Significance

(P value)


A. PLS

ln(U-Cd)

3

0.251

0.237

< 0.01

B. PLS

(current smokers excluded)

ln(U-Cd)

5

0.308

0.330

< 0.001

C. PLS

(past and current smokers excluded)

ln(U-Cd)

1

0.0729

0.142

< 0.001

D. PLS

sex

3

0.241

0.104

> 0.05

E. PLS

age

2

0.216

0.224

< 0.001

F. PLS

ln(U-NAG)

1

0.054

0.162

< 0.001

G. PLS-DA

Smoking historya

2

0.194

0.185

< 0.01


aSmoking history was defined as either 1 = never smoked and past smoker (n = 106) or 2 = current smoker (n = 20), one individual did not complete the lifestyle questionnaire. Spectra that exhibited signs of bacterial contamination, analgesics or ethanol were excluded from these analyses. All variables were mean-centred and scaled to unit variance. NMR data were reduced to 1,127 data points of δ 0.01 resolution. Sample numbers for PLS models: A, D, E and F: n = 127. B: n = 106. C: n = 79. PLS-DA (model G) n = 126. Number of latent variables in a model were auto-fitted in SIMCA-P+. All models were assessed for validity by Y variable permutation analysis (1,000 permutations, see additional file 1 Figure S4). Scores scatter plots for each multivariate model can also be found in additional file 1 (Figure S5). ln(U-Cd), natural logarithm of urinary cadmium; ln(U-NAG), natural logarithm of urinary-N-acetyl-β-D-glucosaminidase; LV, latent variable; n, sample number; PLS, partial least squares; PLS-DA, partial least squares - discriminant analysis. R2X is the proportion of variance in the X matrix (i.e. spectral NMR data) described by the PLS model. Q2Y is the ability of the PLS model to predict the Y-score (ln(U-Cd), sex, age, ln(U-NAG) or smoking status) of a novel sample or the "cross-validated goodness-of-fit".

Ellis et al. BMC Medicine 2012 10:61   doi:10.1186/1741-7015-10-61

Open Data