Open Access Research article

Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles

Harish Dharuri1, Peter Henneman2, Ayse Demirkan13, Jan Bert van Klinken1, Dennis Owen Mook-Kanamori124, Rui Wang-Sattler5, Christian Gieger6, Jerzy Adamski78, Kristina Hettne1, Marco Roos1, Karsten Suhre49, Cornelia M Van Duijn3, EUROSPAN consortia, Ko Willems van Dijk110 and Peter AC 't Hoen1*

Author Affiliations

1 Center for Human and Clinical Genetics, Leiden University Medical Center, S4-P, PO Box 9600, 2300, RC Leiden, Netherlands

2 Department of Clinical Genetics, DNA Diagnostics Laboratary, University of Amsterdam, Amsterdam, Netherlands

3 Genetic Epidemiology Unit, Departments of Epidemiology and Clinical Genetics, Erasmus University Medical Center, Rotterdam, Netherlands

4 Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, PO Box 24144, Doha, State of Qatar

5 Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

6 Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

7 Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

8 Chair of Experimental Genetics, Technische Universität München, Munich, Germany

9 Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Munchen, German Research Center for Environmental Health, Neuherberg, Germany

10 Department of Endocrinology, Leiden University Medical Center, S4-P, PO Box 9600, 2300, RC Leiden, Netherlands

For all author emails, please log on.

BMC Genomics 2013, 14:865  doi:10.1186/1471-2164-14-865

Published: 9 December 2013

Abstract

Background

Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets.

Results

Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html webcite), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression.

Conclusions

We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression.

Keywords:
Genome-wide association; Metabolite; Genotype-phenotype prioritization; Bioinformatics; Pathway databases