Table 1 

Overview of the pretreatment methods used in this study. In the Unit column, the unit of the data after the data pretreatment is stated. O represents the original Unit, and () presents dimensionless data. The mean is estimated as: and the standard deviation is estimated as: . and represent the data after different pretreatment steps. 

Class 
Method 
Formula 
Unit 
Goal 
Advantages 
Disadvantages 


I 
Centering 

O 
Focus on the differences and not the similarities in the data 
Remove the offset from the data 
When data is heteroscedastic, the effect of this pretreatment method is not always sufficient 


II 
Autoscaling 

() 
Compare metabolites based on correlations 
All metabolites become equally important 
Inflation of the measurement errors 
Range scaling 

() 
Compare metabolites relative to the biological response range 
All metabolites become equally important. Scaling is related to biology 
Inflation of the measurement errors and sensitive to outliers 

Pareto scaling 

O 
Reduce the relative importance of large values, but keep data structure partially intact 
Stays closer to the original measurement than autoscaling 
Sensitive to large fold changes 

Vast scaling 

() 
Focus on the metabolites that show small fluctuations 
Aims for robustness, can use prior group knowledge 
Not suited for large induced variation without group structure 

Level scaling 

() 
Focus on relative response 
Suited for identification of e.g. biomarkers 
Inflation of the measurement errors 



III 
Log transformation 

Log O 
Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive 
Reduce heteroscedasticity, multiplicative effects become additive 
Difficulties with values with large relative standard deviation and zeros 
Power transformation 

√O 
Correct for heteroscedasticity, pseudo scaling 
Reduce heteroscedasticity, no problems with small values 
Choice for square root is arbitrary. 



van den Berg et al. BMC Genomics 2006 7:142 doi:10.1186/147121647142 