Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Network-based group variable selection for detecting expression quantitative trait loci (eQTL)

Weichen Wang1* and Xuegong Zhang23

Author Affiliations

1 Mathematics and Physics, School of Sciences, Tsinghua University, Beijing 100084, China

2 MOE Key Laboratory of Bioinformatics/Bioinformatics Division, TNLIST, Beijing 10084, China

3 Department of Automation, Tsinghua University, Beijing 100084, China

For all author emails, please log on.

BMC Bioinformatics 2011, 12:269  doi:10.1186/1471-2105-12-269

Published: 30 June 2011

Abstract

Background

Analysis of expression quantitative trait loci (eQTL) aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD) structure between loci in high-noise background.

Results

We propose a network-based group variable selection (NGVS) method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs.

Conclusions

The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.