Table 1

Overview of the pretreatment methods used in this study. In the Unit column, the unit of the data after the data pretreatment is stated. O represents the original Unit, and (-) presents dimensionless data. The mean is estimated as:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M1">View MathML</a>

and the standard deviation is estimated as:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M2">View MathML</a>

.

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M3">View MathML</a>

and

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M4">View MathML</a>

represent the data after different pretreatment steps.

Class

Method

Formula

Unit

Goal

Advantages

Disadvantages


I

Centering

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M5">View MathML</a>

O

Focus on the differences and not the similarities in the data

Remove the offset from the data

When data is heteroscedastic, the effect of this pretreatment method is not always sufficient


II

Autoscaling

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M6">View MathML</a>

(-)

Compare metabolites based on correlations

All metabolites become equally important

Inflation of the measurement errors

Range scaling

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M7">View MathML</a>

(-)

Compare metabolites relative to the biological response range

All metabolites become equally important. Scaling is related to biology

Inflation of the measurement errors and sensitive to outliers

Pareto scaling

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M8">View MathML</a>

O

Reduce the relative importance of large values, but keep data structure partially intact

Stays closer to the original measurement than autoscaling

Sensitive to large fold changes

Vast scaling

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M9">View MathML</a>

(-)

Focus on the metabolites that show small fluctuations

Aims for robustness, can use prior group knowledge

Not suited for large induced variation without group structure

Level scaling

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M10">View MathML</a>

(-)

Focus on relative response

Suited for identification of e.g. biomarkers

Inflation of the measurement errors


III

Log transformation

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M11">View MathML</a>

Log O

Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive

Reduce heteroscedasticity, multiplicative effects become additive

Difficulties with values with large relative standard deviation and zeros

Power transformation

<a onClick="popup('http://www.biomedcentral.com/1471-2164/7/142/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/7/142/mathml/M12">View MathML</a>

√O

Correct for heteroscedasticity, pseudo scaling

Reduce heteroscedasticity, no problems with small values

Choice for square root is arbitrary.


van den Berg et al. BMC Genomics 2006 7:142   doi:10.1186/1471-2164-7-142

Open Data