Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA

Department of Animal Science, University of Tennessee, Knoxville, TN 37996, USA

Department of Chemistry, University of Tennessee, Knoxville, TN, 37996, USA

Abstract

Background

Metabolomics is an emerging high-throughput approach to systems biology, but data analysis tools are lacking compared to other systems level disciplines such as transcriptomics and proteomics. Metabolomic data analysis requires a normalization step to remove systematic effects of confounding variables on metabolite measurements. Current tools may not correctly normalize every metabolite when the relationships between each metabolite quantity and fixed-effect confounding variables are different, or for the effects of random-effect confounding variables. Linear mixed models, an established methodology in the microarray literature, offer a standardized and flexible approach for removing the effects of fixed- and random-effect confounding variables from metabolomic data.

Findings

Here we present a simple menu-driven program, “MetabR”, designed to aid researchers with no programming background in statistical analysis of metabolomic data. Written in the open-source statistical programming language R, MetabR implements linear mixed models to normalize metabolomic data and analysis of variance (ANOVA) to test treatment differences. MetabR exports normalized data, checks statistical model assumptions, identifies differentially abundant metabolites, and produces output files to help with data interpretation. Example data are provided to illustrate normalization for common confounding variables and to demonstrate the utility of the MetabR program.

Conclusions

We developed MetabR as a simple and user-friendly tool for implementing linear mixed model-based normalization and statistical analysis of targeted metabolomic data, which helps to fill a lack of available data analysis tools in this field. The program, user guide, example data, and any future news or updates related to the program may be found at

Findings

Background

Quantitative metabolomics is a high-throughput approach to systems biology in which many small molecules (metabolites) from a biological sample are simultaneously measured, commonly using nuclear magnetic resonance spectroscopy (NMR), gas chromatography—mass spectrometry (GC-MS), or liquid chromatography—mass spectrometry (LC-MS). While transcriptomics and proteomics are established approaches for characterizing the effects of experimental conditions on metabolism, gene and protein expression changes merely indicate the potential for changes in metabolic endpoints. Metabolic changes are “real-world” endpoints, so metabolomics can connect these functional genomics platforms with actual physiology

LC-MS metabolomic approaches fall into two categories: those that attempt to measure every metabolite in the sample (untargeted approaches) and those that attempt to measure only a subset of the metabolites (targeted approaches)

Conventional LC-MS metabolomic data normalization is carried out by expressing each metabolite signal relative to values of sampling/measurement variables

There are limitations to this conventional normalization approach, however. First, often many metabolites are normalized to one internal standard (i.e., one for all positive ions and one for all negative ions). This would introduce additional bias if there were low or negative correlation between the internal standard signal and a metabolite signal (i.e., for metabolites with different chemical properties from the internal standard), or if the internal standard signal differed significantly between treatment groups. Second, while ignoring block factors (i.e., comparing metabolite means averaged across samples analyzed on different days) increases sample size, significant block effects on metabolite signals may widen confidence intervals, which may preclude identification of “significant” metabolites and conceal statistical outliers. Block effects may dramatically bias the data, especially if they are not balanced across treatment groups.

Currently available software packages provide powerful tools for pre-processing (i.e., peak selection and integration and retention time alignment), visualization (i.e., biochemical pathway mapping), and/or interpretation of targeted and untargeted metabolomic data

A flexible and standardized normalization approach that improves on current limitations would improve metabolomic analyses. An efficient and intuitive approach to control for confounding variables is to estimate their effects on metabolite signals using linear models. Rather than assuming similar relationships between each metabolite signal and confounding variables, a linear model fit for each metabolite can be used to estimate and partition the effects of each experimental variable, including treatment factor, on each metabolite signal. Further, experimental variables can be modeled as having either a fixed or random effect on metabolite signals, with important implications. Fixed-effect variables are assumed to have a constant effect on metabolite signals, influence metabolite signals in an anticipated direction, and have a similar influence in replicate experiments. Common fixed-effect variables are number of cells, tissue mass, and ionization efficiency. By comparison, the effects of random-effect variables cannot be anticipated

Mixed models can be used to estimate the effects of fixed- and random-effect variables on a response variable

Given the limitations of current metabolomic data normalization approaches, we developed MetabR, a simple, user-friendly, and stand-alone tool that researchers with no programming background can use to implement linear model-based normalization and statistical analysis of targeted metabolomic data downstream of pre-processing. While MetabR is stand-alone, software with pre-processing tools

Methods

Implementation of MetabR

A graphical user interface (GUI)-based program, MetabR (Additional file

**MetabR.** MetabR program file.

Click here for file

**User Guide.** MetabR user guide.

Click here for file

Screenshot of the MetabR GUI

**Screenshot of the MetabR GUI.**

In this program, either a fixed linear model (function “lm” in the “stats” package) or a linear mixed model (function “lmer” in the “lme4” package)

where μ = group mean,

Group = treatment factor,

Quantity = a measured, continuous value of the amount of tissue used to produce each sample,

IS = a measured, continuous value of the detection signal from an internal standard present in the metabolite extraction solvent,

Day = a normalization factor accounting for the effects of different run days on metabolite signals,

and e = residual error.

The residuals and treatment group means from the fitted model are added together to yield normalized data, which adjusts for effects of sample quantity, ionization efficiency, and run day, as appropriate for the experimental design of the study.

To check normality and equal variance assumptions made by linear models, R functions “shapiro.test” in the “stats” package (“stats” and any other packages not referenced are part of R

**Output**

**File type**

Normalized data

CSV

Normalized data with technical replicates averaged

CSV

A plot of the model residuals for each metabolite vs. each metabolite’s overall mean signal

A plot of the model residuals for each metabolite vs. each metabolite’s overall mean signal, expanded to accommodate metabolite labels

Mean plots for all significant metabolites

CSV

Tukey HSD p-values for all treatment group comparisons for every metabolite

CSV

q-values for all treatment group comparisons for every metabolite

CSV

Mean fold-changes between all treatment group comparisons for every metabolite

CSV

Plots of all confounding variables vs. all metabolite measurements, pre- and post-normalization

Heat map and dendrogram of the normalized data

Spreadsheet for direct upload to Pathway Projector

CSV

Experimental data collection

Two experimental datasets were generated in our lab to illustrate the utility of MetabR. In both experiments, adipose tissue samples were flash frozen in liquid nitrogen and powdered with a mortar and pestle before metabolite extraction, which followed a previously described procedure

The first experiment was designed to examine the effects of dietary restriction and insulin immunoneutralization on adipose tissue metabolism in chickens. A total of 127 metabolites were detected in abdominal adipose tissue from 16- or 17-day-old male broiler chicks that were fed

**Chicken_pos.** Chicken example data 1 from positive ionization mode.

Click here for file

**Chicken_neg.** Chicken example data 2 from negative ionization mode.

Click here for file

The second experiment was designed to examine the effects of Bisphenol A (BPA) on adipose tissue metabolism in mice. A total of 93 metabolites were detected in abdominal adipose tissue from 32 16-week-old inbred male mice which, from weaning, were fed

**Mouse_pos.** Mouse example data 1 from positive ionization mode.

Click here for file

**Mouse_neg.** Mouse example data 2 from negative ionization mode.

Click here for file

Modeling confounding variables as fixed- vs. random-effect

In our chicken example, Group, Quantity, and IS were modeled as fixed-effect variables, while Day was modeled as a random-effect variable. To illustrate the difference, if Day is defined as a fixed-effect variable, the estimated treatment group mean includes the average Day effects, and the variance and corresponding confidence intervals are based only on residual error and sample size. Inferences about treatment effects refer only to the days used in the experiment. If Day is defined as a random-effect variable, the estimated mean no longer includes Day. Instead, the Day effect becomes a source of random variation that is added to the variance of the estimated mean. The variance and confidence intervals are larger than those when Day is treated as a fixed-effect variable, but experimental results can now be correctly extrapolated to all possible days

Results

Chicken experimental results

For the chicken data, Quantity (tissue mass) and IS (internal standard measurement, Tris in positive ionization mode and Benzoic Acid in negative ionization mode) were selected as fixed-effect regression variables, and Day (run day) as a random-effect factor.

Summary information printed in the R console (not shown) includes 1) results from the Shapiro-Wilk test of normality; 2) results from Levene’s test of equality of variance; 3) pairwise mean fold-changes between all treatment groups for significant metabolites (also exported into a spreadsheet; see Table

**Treatment comparison**

**Fast-control**

**InsNeut-control**

**InsNeut-fast**

**Metabolite**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

Mean fold-changes among the three treatment groups for the chicken example data (14 metabolites across positive and negative ionization modes), and associated Tukey HSD p-values for mean differences (bold values are p < 0.05).

ATP

1.273

0.384

1.059

0.932

0.832

0.588

Citraconate

0.969

0.694

0.982

0.915

1.014

0.907

Citrate

1.251

**0.047**

1.054

0.720

0.842

0.196

Dihexose

0.082

**<0.001**

0.590

0.928

7.217

**0.001**

Inosine

0.736

0.328

0.910

0.580

1.236

0.890

Lactate

0.873

0.137

0.991

0.974

1.135

0.198

Pyruvate

1.100

0.353

1.065

0.640

0.969

0.870

2-Oxoglutarate

0.929

0.754

1.511

**0.001**

1.627

**<0.001**

1-Methyladenosine

0.934

0.878

0.923

0.865

0.989

1.000

Glutamine

0.676

**0.026**

2.512

**<0.001**

3.715

**<0.001**

Guanosine

0.762

0.215

0.833

0.257

1.094

0.993

O-Acetyl-L-serine

0.614

0.337

2.276

0.085

3.707

**0.004**

Glucosamine

1.014

0.959

2.073

**<0.001**

2.044

**<0.001**

Thiamine

0.486

0.059

0.781

0.860

1.607

0.156

Figure

Residual error plot for the chicken experiment

**Residual error plot for the chicken experiment.** Legend - Linear model residuals are plotted in relation to overall mean metabolite level.

Figure

Group mean plots for

**Group mean plots for ****-Acetyl-****-serine in the chicken experiment.** Legend - Treatment group metabolite means, 95% confidence intervals, mean fold-changes, and significant difference letters are combined to summarize results for each significant metabolite.

Figure

Pre- and post-normalization plots: metabolite vs. Day

**Pre- and post-normalization plots: metabolite vs. Day.** Legend - Citrate is plotted before and after normalization, showing the effectiveness of the normalization model for removing confounding variation in the chicken experiment. Normalization removed the effect of different run days on the Citrate detection signal.

Pre- and post-normalization plots: metabolite vs. tissue quantity

**Pre- and post-normalization plots: metabolite vs. tissue quantity.** Legend - Normalization removed the correlation between the quantity of tissue analyzed and the Pyruvate detection signal in the chicken experiment.

Figure

Heat map and dendrogram

**Heat map and dendrogram.** Legend - The heat map was produced by the MetabR program using the chicken example data included in Additional files

Table

Mouse experimental results

MetabR was run on the mouse example data in Additional files

**Treatment comparison**

**BPA500-BPA50**

**BPA5000-BPA50**

**Control-BPA50**

**BPA5000-BPA500**

**Control-BPA500**

**Control-BPA5000**

**Metabolite**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

**Fold-change**

**P-value**

Mean fold-changes among the four treatment groups for the mouse example data (12 metabolites across positive and negative ionization modes), and associated Tukey HSD p-values for mean differences (bold values are p < 0.05).

Bisphenol A

0.817

0.998

0.455

0.423

1.420

0.984

0.558

0.490

1.738

0.946

3.117

0.261

Glucose-6-phosphate

1.042

0.081

0.987

0.859

1.023

0.545

0.947

**0.013**

0.981

0.654

1.036

0.168

Lactate

1.663

0.298

1.177

0.923

1.353

0.401

0.708

0.652

0.814

0.997

1.149

0.771

Citrate

1.064

1.000

3.265

0.120

2.273

0.219

3.070

0.141

2.137

0.252

0.696

0.988

Isocitrate

0.809

0.219

1.134

0.644

1.117

0.731

1.401

**0.019**

1.380

**0.026**

0.985

0.999

Phosphoenolpyruvate

1.218

0.551

1.476

0.167

0.793

0.962

1.212

0.852

0.651

0.287

0.537

0.064

Thymine

0.868

0.919

0.552

**0.025**

1.118

0.972

0.636

0.100

1.288

0.710

2.026

**0.009**

Urea

1.325

0.971

0.960

0.993

1.084

0.947

0.725

0.894

0.818

1.000

1.129

0.849

N-Acetyl-L-glutamate

0.449

**0.001**

0.518

**0.007**

0.548

**0.014**

1.152

0.789

1.220

0.638

1.059

0.994

ADP

1.264

0.907

7.812

0.092

11.948

0.035

6.180

0.280

9.452

0.124

1.530

0.957

Tryptophan

1.086

0.998

0.757

0.461

0.870

0.912

0.697

0.367

0.801

0.841

1.150

0.843

Ornithine

1.813

**0.008**

1.563

0.071

1.231

0.476

0.862

0.776

0.679

0.189

0.788

0.686

Conclusions

The open-source statistical computing software R

Availability and requirements

**Project name:** MetabR

**Project home page:**

**Operating system(s):** Windows, Mac, Linux, any system that runs R

**Programming language:** R

**Other requirements:** Required R packages are installed automatically. The program was written and tested using R version 2.15 for Windows.

**License:** GNU General Public License (GPL)

**Any restrictions to use by non-academics:** No restrictions

Availability of supporting data

The datasets supporting the results of this article are included within the article (and its additional files).

Abbreviations

ANOVA: Analysis of variance; BPA: Bisphenol A; CSV: Comma-separated values; GUI: Graphical user interface; HSD: Honest Significant Difference; IS: Internal standard; LC-MS: liquid chromatography—mass spectrometry; LC-MS/MS: Liquid chromatography—tandem mass spectrometry.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

BE wrote the program. BE, SRC, BHV, and JRG collaborated to outline the issues in data analysis and process all biological data. AMS guided implementation of the statistical analysis components of the program. JRG and BE tested the implementation of the program, and all authors contributed to writing the final manuscript draft. All authors read and approved the final manuscript.

Authors’ information

J Gooding’s current address: Sarah W. Stedman Nutrition & Metabolism Center, Duke University School of Medicine, 4321 Medical Park Drive, Suite 200, Durham, NC 27704

Acknowledgements

JRG and SRC were supported by funding from the National Science Foundation through an Ocean Sciences award (OCE-1061352) to the University of Tennessee at Knoxville. Funding for metabolomic analyses of chicken adipose tissue was provided by a University of Tennessee AgResearch Innovation Grant to BHV and SRC.

The authors thank Brantley Wyatt, previously of the University of Tennessee Graduate School of Genome Science and Technology, for conducting the mouse experiments and generating the mouse adipose tissue samples used in this work, and Drs. Joelle Dupont and Jean Simon of the Institut National de la Recherche Agronomique (INRA) for conducting the chicken experiments and providing the corresponding adipose tissue samples.