Table 1 

Statistics of multivariate analysis models demonstrating an association between ^{1}H NMR spectroscopic data and several biological and lifestyle factors. 

Model Description 
Y variable 
Number of LVs 
R^{2}X 
Q^{2}Y 
Significance (P value) 


A. PLS 
ln(UCd) 
3 
0.251 
0.237 
< 0.01 
B. PLS (current smokers excluded) 
ln(UCd) 
5 
0.308 
0.330 
< 0.001 
C. PLS (past and current smokers excluded) 
ln(UCd) 
1 
0.0729 
0.142 
< 0.001 
D. PLS 
sex 
3 
0.241 
0.104 
> 0.05 
E. PLS 
age 
2 
0.216 
0.224 
< 0.001 
F. PLS 
ln(UNAG) 
1 
0.054 
0.162 
< 0.001 
G. PLSDA 
Smoking history^{a} 
2 
0.194 
0.185 
< 0.01 


^{a}Smoking history was defined as either 1 = never smoked and past smoker (n = 106) or 2 = current smoker (n = 20), one individual did not complete the lifestyle questionnaire. Spectra that exhibited signs of bacterial contamination, analgesics or ethanol were excluded from these analyses. All variables were meancentred and scaled to unit variance. NMR data were reduced to 1,127 data points of δ 0.01 resolution. Sample numbers for PLS models: A, D, E and F: n = 127. B: n = 106. C: n = 79. PLSDA (model G) n = 126. Number of latent variables in a model were autofitted in SIMCAP+. All models were assessed for validity by Y variable permutation analysis (1,000 permutations, see additional file 1 Figure S4). Scores scatter plots for each multivariate model can also be found in additional file 1 (Figure S5). ln(UCd), natural logarithm of urinary cadmium; ln(UNAG), natural logarithm of urinaryNacetylβDglucosaminidase; LV, latent variable; n, sample number; PLS, partial least squares; PLSDA, partial least squares  discriminant analysis. R^{2}X is the proportion of variance in the X matrix (i.e. spectral NMR data) described by the PLS model. Q^{2}Y is the ability of the PLS model to predict the Yscore (ln(UCd), sex, age, ln(UNAG) or smoking status) of a novel sample or the "crossvalidated goodnessoffit". 

Ellis et al. BMC Medicine 2012 10:61 doi:10.1186/174170151061 