Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis

Hiroyuki Yamamoto*, Tamaki Fujimori, Hajime Sato, Gen Ishikawa, Kenjiro Kami and Yoshiaki Ohashi

Author Affiliations

Human Metabolome Technologies, Inc, 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan

For all author emails, please log on.

BMC Bioinformatics 2014, 15:51  doi:10.1186/1471-2105-15-51

Published: 21 February 2014

Abstract

Background

Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences are made for these metabolites. However, this approach may lead to biased biological inferences because these metabolites are not objectively selected with statistical criteria.

Results

We propose a statistical procedure that selects metabolites with statistical hypothesis testing of the factor loading in PCA and makes biological inferences about these significant metabolites with a metabolite set enrichment analysis (MSEA). This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between the PC score and each metabolite level. We applied this approach to two sets of metabolomic data from mouse liver samples: 136 of 282 metabolites in the first case study and 66 of 275 metabolites in the second case study were statistically significant. This result suggests that to set the number of metabolites before the analysis is inappropriate because the number of significant metabolites differs in each study when factor loading is used in PCA. Moreover, when an MSEA of these significant metabolites was performed, significant metabolic pathways were detected, which were acceptable in terms of previous biological knowledge.

Conclusions

It is essential to select metabolites statistically to make unbiased biological inferences from metabolomic data when using factor loading in PCA. We propose a statistical procedure to select metabolites with statistical hypothesis testing of the factor loading in PCA, and to draw biological inferences about these significant metabolites with MSEA. We have developed an R package “mseapca” to facilitate this approach. The “mseapca” package is publicly available at the CRAN website.

Keywords:
Principal component analysis; Statistical hypothesis testing of factor loading; Metabolite set enrichment analysis