Table 1

An overview of the proposed data analysis workflow.


Detailed tasks


Statement of the problem

• Specify comparisons of interest

• Express comparisons as statistical hypotheses

• Define scope of biological replication

• Restricted scope suitable for screening; expanded scope required for validation

Exploratory data analysis

• Detect mis-identified features

• Remove obvious outliers

• Detect features with missing values

• Choose imputation strategy

Model-based analysis

• Fit linear mixed model per protein

• Reduced scope of biological replication = fixed subjects; expanded scope = random subjects

• Check qq-plots plots for Normality

• If deviations, conclusions are approximate only

• Check residual plots for equal variance

• If deviations, use iterative least squares

• Test comparisons of interest

• Adjust p-values per comparison to control FDR

• Quantify protein abundance in conditions or samples of interest

• Use as input with downstream clustering or classification

Design follow-up experiments

• Evaluate power and sample size

• Find minimal sample size for a fold change

• Find minimal fold change for a sample size

Supplementary Table 2 shows MSstats commands for each step.

Clough et al. BMC Bioinformatics 2012 13(Suppl 16):S6   doi:10.1186/1471-2105-13-S16-S6

Open Data