Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

Synthesis of 53 tissue and cell line expression QTL datasets reveals master eQTLs

Xiaoling Zhang1, Hinco J Gierman2, Daniel Levy1, Andrew Plump3, Radu Dobrin4, Harald HH Goring5, Joanne E Curran5, Matthew P Johnson5, John Blangero5, Stuart K Kim2, Christopher J O’Donnell16, Valur Emilsson7 and Andrew D Johnson1*

Author Affiliations

1 Division of Intramural Research, National Heart, Lung and Blood Institute, Cardiovascular Epidemiology and Human Genomics Branch, The Framingham Heart Study, 73 Mt. Wayte Ave., Suite #2, Framingham, MA, USA

2 Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA

3 Sanofi Aventis Pharmaceuticals, Bridgewater, NJ 08807, USA

4 Johnson & Johnson Pharmaceutical Research and Development, Radnor, PA 19477, USA

5 Department of Genetics, Texas Biomedical Research Institute, San Antonio, TX 78227, USA

6 Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA

7 Icelandic Heart Association, Kopavogur, Iceland

For all author emails, please log on.

BMC Genomics 2014, 15:532  doi:10.1186/1471-2164-15-532


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/15/532


Received:30 December 2013
Accepted:18 June 2014
Published:27 June 2014

© 2014 Zhang et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Abstract

Background

Gene expression genetic studies in human tissues and cells identify cis- and trans-acting expression quantitative trait loci (eQTLs). These eQTLs provide insights into regulatory mechanisms underlying disease risk. However, few studies systematically characterized eQTL results across cell and tissues types. We synthesized eQTL results from >50 datasets, including new primary data from human brain, peripheral plaque and kidney samples, in order to discover features of human eQTLs.

Results

We find a substantial number of robust cis-eQTLs and far fewer trans-eQTLs consistent across tissues. Analysis of 45 full human GWAS scans indicates eQTLs are enriched overall, and above nSNPs, among positive statistical signals in genetic mapping studies, and account for a significant fraction of the strongest human trait effects. Expression QTLs are enriched for gene centricity, higher population allele frequencies, in housekeeping genes, and for coincidence with regulatory features, though there is little evidence of 5′ or 3′ positional bias. Several regulatory categories are not enriched including microRNAs and their predicted binding sites and long, intergenic non-coding RNAs. Among the most tissue-ubiquitous cis-eQTLs, there is enrichment for genes involved in xenobiotic metabolism and mitochondrial function, suggesting these eQTLs may have adaptive origins. Several strong eQTLs (CDK5RAP2, NBPFs) coincide with regions of reported human lineage selection. The intersection of new kidney and plaque eQTLs with related GWAS suggest possible gene prioritization. For example, butyrophilins are now linked to arterial pathogenesis via multiple genetic and expression studies. Expression QTL and GWAS results are made available as a community resource through the NHLBI GRASP database [http://apps.nhlbi.nih.gov/grasp/ webcite].

Conclusions

Expression QTLs inform the interpretation of human trait variability, and may account for a greater fraction of phenotypic variability than protein-coding variants. The synthesis of available tissue eQTL data highlights many strong cis-eQTLs that may have important biologic roles and could serve as positive controls in future studies. Our results indicate some strong tissue-ubiquitous eQTLs may have adaptive origins in humans. Efforts to expand the genetic, splicing and tissue coverage of known eQTLs will provide further insights into human gene regulation.

Keywords:
eQTL; RNA; Gene expression; Genomics; Transcriptome; GWAS; Genome-wide; Tissue; Cis; Trans

Background

Genome-wide genetic analysis of gene expression [1,2] identifies expression quantitative trait loci (eQTLs) which are mainly regulatory variants associated with cis- expression of nearby genes. Discovery of eQTLs may help elucidate the genetic mechanisms underlying natural variation in gene expression [3,4]. Identifying these genetic variants may improve our understanding of molecular mechanisms of disease risk, and of potential drug targets. Human cross-tissue allele-specific expression studies indicate a significant fraction of genes are under genetic control by one or more alleles [5-7]. Strong eQTLs are often highly correlated with markers of disease and quantitative traits at loci identified in GWAS [8-13], suggesting that these eQTLs account for a significant fraction of human phenotypic variability. However, to date there are few attempts at characterizing cross-tissue eQTL datasets in a centralized manner.

Thus far, eQTL studies have analyzed gene expression traits measured primarily by DNA microarrays in liver [9,14-16], multiple blood cell types [17-27], brain regions [24,28-31], endothelial cells [32], stomach [9], skin [33], and adipose [9,19]. Expression QTL effects are often partitioned into either cis or trans-acting effects, and few studies have thoroughly characterized trans eQTL associations, in part due to computational burden [34]. Furthermore, approaches to data collection and analysis of cis and trans eQTLs have been relatively non-uniform [34,35]. Dimas et al. compared eQTLs discovered from 3 blood-related cell types [17], and found that only ~30% of eQTLs were directly shared across tissues. Later studies undertook multi-tissue comparisons of cis-eQTLs including lymphoblastoid cell lines (LCL) versus skin cells [33]; LCL, skin, and fat [36]; liver, omental, and subcutaneous adipose [9], and re-analysis of the Dimas et al. datasets with new methods [37]. Overall, these later studies found evidence for a high degree of sharing (~50-80%) of cis-eQTLs across tissues, while still indicating a significant minority of cis-eQTLs remain relatively tissue-specific. Prior studies compared at most 4 tissues and generally did not include external validation of signals or studies of trans-eQTLs. Thus, a rigorous comparison, across many tissues and populations with good statistical power remains relatively incomplete.

We sought to collect, standardize, and annotate a variety of eQTL results into a comprehensive central database in order to answer several basic research questions about eQTLs: 1) Are there master/housekeeping cis and trans eQTLs across tissues and what are their biologic functions? 2) What consistent cis and trans-eQTL patterns emerge across datasets including positional genomic location and overlap with regulatory annotations? 3) What genome-wide association (GWAS) variants converge with eQTL peaks? 4) Does integration of disparate eQTL data identify new trans-acting loci?

To address these questions we collected and analyzed available results from 53 eQTL population datasets. These 53 datasets represent analyses from 24 published manuscripts and 13 previously unpublished analyses reflecting >27 cell and tissue types. Most summary-level results are available for download as a subset of the NHLBI Genome-wide Repository of Associations between SNPs and Phenotypes (GRASPdb) [38].

Results

Characteristics of 53 gene expression GWAS (eQTL) datasets

The eQTL datasets (n = 53) collected included liver [9,14-16], adipose tissues [9,19], various brain tissues [24,28-31] and blood lineage cells including whole blood [19,20,23,25], lymphocytes [17,21,26], monocytes [24,39], osteoblasts [22], fibroblasts [17] and Epstein-Barr transformed B-LCL [17,18,27]. Other tissues included kidney, stomach [9], skin [33] and peripheral artery plaque (see Table 1 for study summaries and [Additional file 1] for detailed characteristics). In some cases significant results beyond those originally reported were available via collaboration, otherwise the results reflected either new results from this paper or publicly available eQTL results that passed statistical correction thresholds defined by the original authors. The sample size varied widely across these studies (range n = 52-1,490, median n = 193, mean n = 311). Some of the 53 datasets reflected subgroup analyses (e.g., cases or controls, European or African ancestry). After common annotation of all datasets, dataset sample size showed modest logarithmic fit with the number of cis-eGenes identified (r2 = 0.45) and less so with trans-eGenes (r2 = 0.24) [Additional file 1]. This suggests many prior studies may have been underpowered but signal saturation may be approached with several thousand samples.

Table 1. Summary of 53 eQTL datasets, their origins and original reported parameters

Additional file 1. eQTL dataset origins and descriptions. eQTL dataset sources and information about sample sizes, total cis and trans eQTLs and eSNPs, SNP and expression platforms.

Format: XLSX Size: 14KB Download fileOpen Data

Genotyping and gene expression arrays across the datasets were heterogeneous (Table 1). Genotyping assays included Affymetrix (500 K, 6.0), Illumina (100 K, 300 K, 550 K, 610 Kquad, 650 K) and Perlegen SNP arrays (300 K, 438 K). Only a small proportion of datasets (n = 10, 18.9%) included imputed SNP analysis. Expression assays included custom arrays, Affymetrix (Human ST 1.0 exon, U133 plus A/B/2.0), and Illumina (WG-6 v1, WG-6 v3, HumanRefSeq-8 v2, HT12) arrays, with a mean of 20,246 RNAs interrogated across unique studies. Thus, these analyses primarily reflected mRNA expression of protein-coding genes, with few splice-specific analyses [24]. The datasets utilized different criteria for reporting significant results, including different multiple test correction thresholds and distance thresholds for defining cis-acting eQTLs (range = 100 kb to 5 Mb). As a result of these combined factors, as well as varying statistical power, whether trans analysis was conducted, and the extent of disclosed results, there were a broad range of significant eQTLs defined by the studies (range n = 33–22,473).

Frequency of eGenes and eQTLs across 53 datasets after common annotation

A total of 19,444 eGenes mapped directly to NCBI RefSeq gene symbols (n = 17,294) or RefSeq gene aliases (n = 2,150) [Additional file 2]. The majority of both eGenes and eQTLs were reported in only one dataset (Figure 1), which may reflect false positives, tissue-specific results, or a lack of statistical power, and SNP and/or transcript coverage differences across studies. Nevertheless, 1,784 eGenes were found in ≥30% of the datasets (n ≥ 15 datasets) (Figure 1A).

Additional file 2. Summary of all eQTLs and eGenes and their mapping and filtering. Description of filtering steps and number of eQTLs, eSNPs and eGenes.

Format: XLSX Size: 10KB Download fileOpen Data

thumbnailFigure 1. Frequency of eGenes and eQTLs across 53 datasets. A: Distribution of the occurrence of 19,038 unique eGenes across all 53 eQTL datasets. Inset: histogram of 1,784 genes found in > =15 eQTL datasets. B: Distribution of the occurrence of 56,089 unique, best cis-eQTLs across all 53 eQTL datasets. Inset: Histogram of 279 cis-eQTLs found in > =15 eQTL datasets. C: Distribution of the occurrence of 7,075 unique and best trans-eQTLs across all 53 eQTL datasets. Inset: Histogram of 37 trans-eQTLs found in ≥ 4 eQTL datasets. For each trans-eQTL, all proxy SNPs in perfect linkage disequilibrium (r^2 = 1 in CEU) are also included [42].

A total of 419,796 eQTLs passed at least nominal statistical correction thresholds in the 53 original sources. These included redundant eQTLs in relatively high linkage disequilibrium (LD) in some datasets. We retained the most significant eQTL for each eGene within each dataset yielding 116,563 “best” eQTLs from the constituent datasets. We mapped all best eQTLs in a common genome build (hg18) and applied a uniform distance threshold (500 kb) across all 53 datasets to define cis and trans-acting variants, finding 106,083 cis-eQTL-eGene associations (91%) and 10,480 trans-eQTL-eGene associations (9%). On average, each eGene is associated with 1.8 eQTLs. For 62,872 unique best eQTLs across datasets, 279 cis eQTLs are found in ≥30% of the datasets (N ≥ 15) (Figure 1B), while only 37 SNPs are trans-associated with eGenes in ≥ 4 datasets (Figure 1C).

Master eQTLs with strong cis genetic influences across tissues

To assess the most ubiquitous eQTLs, we examined 33 eGenes whose expression was significantly affected by SNPs in ~70% of datasets (n ≥ 35) and performed unsupervised hierarchical clustering (Figure 2). Several eGenes demonstrated strong genetic influences in more than 80% of datasets (n ≥ 42), including PEX6, GSTM3, PPIL3, MRPL43, and CHURC1. When compared against results from the GTeX (Genotype-Tissue Expression) project portal [40], 30 of these 33 eGenes had significant cis-eQTL in 2 or more of 9 independent tissues analyzed in that project (Table 2). The SNPs in Table 2 were checked for potential polymorphism in probe effects using PiPmaker [41]. None of the SNPs listed were found to directly overlap probes. Six of the SNPs had perfect proxy SNPs (r2 = 1.0) that overlapped one or more Affymetrix or Illumina probes (ACP6, ARNT, ITGB3BP, GSTM3, NDUFS5, THEM4), indicating a small minority of these widespread cis-eQTLs may be influenced by SNP in probe effects.

thumbnailFigure 2. Hierarchical clustering shows robust eGenes with strong genetic influences across a majority of studies. eGenes present in >70% of datasets (>35/53 datasets). Individual datasets are indicated at bottom with eGenes listed to the right. Presence (black) or absence (white) of eGenes as eQTLs within individual datasets is shown.

Table 2. Most frequently occurring cis-eGenes across all datasets

These genes may represent housekeeping or master cis-eGenes, and could be useful positive controls in future studies. We next extended clustering to 248 high confidence eGenes found in ≥25 of our datasets [Additional file 3] and found eQTLs clustered by tissue type but were also greatly influenced by overlapping study samples. For example there was clustering of eQTLs from different brain anatomical sites derived from the same study samples, whereas an independent brain study which reported fewer eQTLs [28] was in a distinct cluster from the largest brain eQTL study [31]. Clustering was observed for three eQTL datasets in different blood cells that applied similarly stringent correction thresholds [17]. Pathway and ontology analysis of the 248 clustered cis-eQTLs revealed enrichment of genes involved in antigen processing and presentation and immune function, glutathione S-transferase activity, and mitochondrial function [Additional file 4].

Additional file 3. Hierarchical clustering analysis of 248 eGenes found in ≥ 25/53 datasets used in pathway and ontology analyses. Clustering diagram of eGenes found in ≥ 25 datasets.

Format: DOC Size: 128KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional file 4. Pathway and ontology analysis results for 248 most ubiquitous eGenes. Significantly enriched gene categories among highly repeated eGenes across tissues.

Format: XLSX Size: 13KB Download fileOpen Data

We further characterized putative functional explanations for the 33 most ubiquitous cis-eGenes (Figure 2), for which gene symbols and basic functions are described in [Additional file 5]. All of the eQTL SNPs were common variants (the lowest MAF is 9% in CEU), and their signals were consistently large in effect (Table 2). The most frequent eQTL across datasets was often not the strongest eQTL but was highly correlated with the strongest eQTL, with a few exceptions (NUDT2 pairwise r2 = 0.08, NQO2 r2 = 0.11, MYOM2 r2 = 0.17, GSTM3 r2 = 0.20). These exceptions may reflect coverage differences across studies or allelic heterogeneity of functional variants at some loci. A functional characterization of all SNPs in Table 2 and their perfect proxies (r2 = 1.0 in 1000 Genomes phase I European samples [42]) indicates ~2/3 of loci had a perfectly correlated nonsynonymous SNP (nSNP), splice site SNP or UTR SNP, although functional interpretation was not always straightforward since there were multiple SNPs with putative function in some cases. We queried the SNPs in Table 2 against ENCODE regulatory features using RegulomeDB [43]. Most of the loci in Table 2 displayed one or more strong eQTL directly overlapping an ENCODE regulatory features (e.g., transcription factor binding site prediction, footprinting motif, chromatin structure features and/or protein binding (ChIP-seq feature)) [Additional file 6], suggesting many of them are likely functional regulatory variants. For example, rs3768324 was the strongest observed eQTL for NDUFS5 in 8 datasets, overlapped abundant regulatory features including ChIP-seq peaks such as POL2, SRF, PAX5 and ELK4, and lay close to the transcription initiation site.

Additional file 5. Full gene names and descriptions for 33 eGene significant in ≥35 datasets. Full gene names and descriptions for 33 eGene significant in ≥35 datasets.

Format: XLSX Size: 13KB Download fileOpen Data

Additional file 6. Overlap of master-cis and trans-eQTLs with ENCODE regulatory features. Intersection of master-cis and trans-eQTLs with ENCODE regulatory features (transcription factor position weight matrices, DNA footprinting motifs, chromatin structure, protein binding by chIP-seq) as determined with RegulomeDB queries.

Format: XLSX Size: 17KB Download fileOpen Data

Long-range cis and trans-chromosomal eQTL results

Thirty-seven eGenes had trans-association (>500 kb from the eGene to the eQTL, or the eQTL on a different chromosome) in 4 or more datasets (Table 3). The 4 dataset threshold was selected to reduce the effects of intra-study sample correlation since most eQTL publications contain ≤3 tissues from the same individuals. At least half of the 37 trans eGenes appeared to be long-range cis associations (>500 kb), and several appeared to be possible misinterpretations due to genes that map to multiple genomic locations. Among eGenes/eQTLs on different chromosomes, there were several known and replicated trans-eQTL loci, e.g., MHC class II region on chr6 [20], the MAPT region on chr17 [44,45], and the BCL11A/HBG beta-globin interaction [20,46]. A single chr12 SNP, rs10876864, exhibited strong trans associations with 9 targets on 9 different chromosomes, in 4 distinct tissues: liver, omental adipose, blood cells and prefrontal cortex. The same variant also showed strong cis associations with RPS26, and to a lesser degree, SUOX [Additional file 7], and was associated with vitiligo [47]. Notably, this variant is in high LD with rs11171739 (r2 = 0.86 in CEU) previously implicated in blood cell cis association with RPS26 and SUOX and trans association with several targets, as well GWAS associations for Type I diabetes [20,48]. Of the two variants, rs10876864 had strong cis and trans associations in a broader range of tissues, and aligned with histone signatures and >25 ChIP-seq binding signals [Additional file 6]. Additionally, rs10876864 is in perfect LD (r2 = 1 in CEU) with rs1131017, a SNP absent from all commercial genotyping arrays which is positioned near the transcription start site of RPS26. Many of the SNPs or proxies in Table 3 also overlapped with ENCODE regulatory features based on RegulomeDB queries [Additional file 6].

Table 3. trans-eQTLs (>500 kb) observed in 4 or more datasets

Additional file 7. Trans-eQTL and cis-eQTL associations in chr12q13.2 region. Trans-eQTL and cis-eQTL associations in chr12q13.2 region.

Format: XLSX Size: 12KB Download fileOpen Data

Additional file 8. Trans-eQTL loci results (for loci summarized in Table 3). Individual trans-eQTL loci results for those loci summarized in Table 3.

Format: XLSX Size: 32KB Download fileOpen Data

Our cross-dataset analysis also highlighted some interesting potential new trans signals. Target transcripts and tissue associations are fully described in [Additional file 8]. One set of correlated trans eQTLs on chr19p12 localized near zinc finger (ZNF) gene ZNF429, and was found within a large ZNF cluster including many genes. Notably the correlated eQTLs in this region were specifically associated in trans with the expression of zinc finger genes elsewhere in the genome-wide, including 4p16.3 (ZNF595), 7p11.2 (ZNF479), 7q11.21 (ZNF679), and within 19p12 (ZNF99, ZNF486). However, BLAT analysis [49] revealed that the chr4 and chr7 transcripts map with 83.5%-85.1% identity to the 19p12 region suggesting that gene homology and probe cross-hybridization could be responsible for the apparent trans associations. A SNP on chromosome 11, rs10902222, demonstrated strong cis associations mainly with PNPLA2 and RPLP2, as well as trans associations with 3 different target regions (LRFN1, HCN2, FAM27B). A BLAT analysis of the SNP and the associated transcripts did not show homology indicating this may represent a new trans-eQTL locus [Additional file 9].

Additional file 9. Putative novel trans-eQTL and results at chr 11p15.5. Putative novel trans-eQTL and results at chr 11p15.5. All cis and trans results for 11p15.5 are displayed.

Format: XLSX Size: 11KB Download fileOpen Data

We additionally searched for distant eQTLs in 1 or more dataset with P < 5E-8 that overlapped long range regulatory interaction sites via ENCODE chromosome conformation capture carbon copy (5C) data [50]. Two SNPs had evidence for long-range interactions and eQTL association at this stringent threshold. Both SNPs were associated with expression in subcutaneous adipose (rs932562, P < 2.9E-22 for WFDC2 (10.2 Mb away) [9]; rs1045001, P < 1.9E-8 for RHBDL1 (0.62 Mb away) [19]) [Additional file 10]. However, the 5C interactions for both SNPs were more localized (up to 150 kb and 450 kb, respectively) than the eQTL associations (10.2 Mb and 6.6 Mb away) [Additional file 10]. Both variants also exhibit more localized, strong cis associations in other tissue datasets. This suggests medium-range regulatory effects of these variants, possibly corresponding to features identified by 5C, may in turn further influence longer range gene regulation megabases away.

Additional file 10. Long range cis eQTLs (P < 5E-8) and their short and long cis-eQTL associations. Short- and long-range cis-eQTL associations for chromosome 16 and 20 regions with associations overlapping ENCODE 5C (chromatin conformation) interactions in lymphoblastoid cell lines.

Format: XLSX Size: 13KB Download fileOpen Data

Significance of eQTLs relative to distance from eGenes

Strength of eQTL signal correlated with the distance of the eQTL from its associated eGene boundary. Among 62,872 unique strongest cis- or trans-eQTLs, the majority of identified eQTL (89%) were located within cis-regions (cis-acting SNPs) (Figure 3), consistent with past reports [2]. There was a sharp drop in eQTL significance, as measured by P-values, near gene boundaries (median dataset kurtosis = 11) both up and downstream of eGene coding regions (Figure 4A), indicating eQTLs closer to their associated transcripts have higher significance. Individual dataset distributions split by 24 brain-related datasets, 14 blood, 5 liver, 3 fat and 7 other tissue datasets are shown in [Additional file 11]. Distributions of individual datasets were consistently kurtotic with only slight bias to the 5′ direction (median skewness = -0.032, mean SNP distance from gene = -1,356 bp). Results focused around 5′ transcription start site regions alone showed a strong central tendency within ±5 kb, with slight preference toward location in the downstream Exon 1 or 5′UTR direction (Figure 4B).

thumbnailFigure 3. eQTL-eGene distance distributions relative to datasets and tissue group. Common SNP and transcript annotations were used to re-annotate all datasets and eQTL location categorized as: in the eGene, cis (≤500 kb from eGene), trans (>500 kb but on the same chromosome), trans.diff.chr (eQTL and eGene map to different chromosomes).

thumbnailFigure 4. Significance of eQTLs relative to distance from eGene boundaries. A: 116,563 best eQTLs per eGene per dataset are shown across all 53 eQTL datasets. eQTLs located in their eGenes are plotted at 0 on the x-axis, otherwise the x-axis indicates distance of each eQTL to its eGene (from 5′: -1 Mb to 3′: +1 Mb). Not shown are 393 eQTLs with P < 1 × 10-150 which also display a highly central tendency. B: A histogram of the number of eQTLs per kb of distance from the 5′ transcription start sites (TSS) of eGenes.

Additional file 11. Significance of eSNPs relative to distance from their associated eGenes for different tissue types. Significance of eSNPs relative to distance from their associated eGenes for different tissue types, respectively. PanelA: blood tissues and cell types (n = 14 datasets), PanelB: brain tissues (n = 24 datasets), PanelC: liver (n = 5 datasets), PanelD: fat-related (n = 3 datasets), PanelE: other tissues (n = 7 datasets). Y-axis is scaled to a cutoff at P < 1E-150 obscuring a small proportion of results.

Format: DOC Size: 1.3MB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

A minority of SNPs > 500 kb away from their associated eGenes were highly significant (0.5%, P < 1 × 10-50, 13.4% with P < 5e-8) (Figure 4A). Nonetheless, there were 7,075 significant eQTLs that are >500 kb distant from their associated eGene. The relative proportions of SNPs mapping within genes they are associated with, cis (1 bp-500 kb), trans (same chromosome >500 kb) and trans (different chromosome) is shown in Figure 3. Comparison across major tissue groups indicated an enrichment of trans (different chromosome) results in brain eQTLs relative to other tissue types (e.g., P < 0.002 relative to blood eQTLs).

Enrichment of eQTLs within regulatory, selection and chromosomal features

To understand the spectrum of potential cis and trans-acting regulatory mechanisms across the human genome, we examined functional mapping of eQTLs to regulatory features from a variety of sources. A total of 62,872 unique best eQTLs were aligned against 22 regulatory feature datasets. Binomial tests indicated that these unique best eQTLs are localized within several regulatory features in the genome more than expected by chance (P < 0.01 for 14 out of a total of 22 regulatory features) shown in Table 4. Many of these features tend to co-localize closely to coding gene regions so overlaps may be expected based on the gene-centric tendency of eQTLs to associated eGenes. After adjustment for a variety of features, cis-eQTLs were most abundant (in order) on chromosomes 22, 21, 6, 20, 10 and 19, and least abundant (in order) on chromosomes Y, X, 7 and 3 [Additional file 12].

Table 4. eQTLs compared to human genome regulatory features.

Additional file 12. cis-eQTL representation by chromosome (relative to length, gene #, RNA #, variation #). Proportion of unique best cis- and trans-eQTLs by autosomal and sex chromosome. Proportions after adjustment for chromosome length, number of CCDS genes, total HuRef human RNA lengths, and number of HuRef variants are displayed, along with overall mean ranks for most to least cis-eQTLs per chromosomes across all adjustments.

Format: XLSX Size: 12KB Download fileOpen Data

Housekeeping genes are more often eQTLs

When a gene is expressed in multiple tissues or cells at relatively constant levels, regulatory control may be common across the tissues. To investigate the relationship between housekeeping and non-housekeeping eGenes we categorized them based on a previous analysis of publicly available expression data in 18 human tissues [51]. Out of 19,038 unique eGenes in our study, 2,207 were defined as housekeeping genes and 16,831 as non-housekeeping genes. A density plot of housekeeping eGenes showed they are more overrepresented in the right tail of distribution than non-housekeeping eGenes (Figure 5, P < 1.12 × 10-11, Student’s t-test).

thumbnailFigure 5. Housekeeping genes are over-represented among eGenes common to many tissue datasets. A density plot of eGenes that are housekeeping versus non-housekeeping genes (as defined by [51]) across datasets. The eGene distributions differ significantly (P < 1.12 × 10-11).

Expression QTL concordance with GWAS peak signals

Expression QTLs from the current study were compared against the NHGRI GWAS catalog. Since many eQTL studies did not conduct imputation we also assessed the overlap with LD perfect proxies for the GWAS catalog SNPs (r2 = 1) [42]. Among 8,845 unique GWAS SNPs, 926 were directly found among 62,872 unique best eQTLs (~10.5% overlap) [Additional file 13]. For these 926 common SNPs, there was significant positive correlation in strength of signal (assessed by P-values) between reported eQTL and trait GWAS associations (Spearman’s P = 2.75 × 10-26, [Additional file 14]. When LD partners (r2 = 1) are incorporated ~22% of GWAS catalog signals corresponded to a best eQTL association in our database. The NHGRI catalog was limited to selected top results, thus we further compared both eQTL and nSNP distributions within the test distributions of 45 full GWA trait scans for a variety of human disease, dichotomous and quantitative traits. For most GWA scans (n = 38/45) we found significant enrichment of eQTL SNPs among significant GWA results across the full test statistic distributions [Additional file 15]. Non-synonymous SNPs showed less enrichment (n = 13) and were significantly depleted in some scans (n = 2). This pattern persisted at the significant tail of the distribution (limiting to GWAS P < 1E-2) where 25 of 45 GWA were enriched for eQTL SNPs whereas only 3 GWA showed enrichment for nSNPs and 11 indicated depletion of nSNPs among significant results.

Additional file 13. Comparison of eQTL results to NHGRI GWAS catalog SNPs. Comparison of eQTL results (all or best eSNPs and their perfect proxies in HapMap CEU) to NHGRI GWAS catalog SNPs.

Format: XLSX Size: 9KB Download fileOpen Data

Additional file 14. Correlation between eQTL and GWAS p-values in the NHGRI GWAS catalog. The correlation in strength of signal (represented by –log10 P-value) between reported eQTL studies and trait GWAS associations represented in the NHGRI GWAS catalog.

Format: DOC Size: 55KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Additional file 15. Enrichment or depletion of nSNPs (n = 100,601) and eQTLs (n = 62,872 best) among 45 full trait GWAS scans. Pubmed identifiers and GWAS traits are given for 45 full GWAS scans whose results were compared to nSNPs (n = 100,601) and eQTLs (n = 62,872 best eSNPs). Genomic inflation factors (λ) are given for each trait and nSNPs and eQTLs for the full scans and at a threshold of P < 1E-2 in the GWAS. Komogorov-Smirnoff (K-S) test p-values for differences in distributions are given. Enrichments are highlighted in blue and depletions in grey, with significant K-S tests in red and non-significant ones in green.

Format: XLSX Size: 17KB Download fileOpen Data

Novel plaque and kidney eQTLs linked to GWAS results

To our knowledge, the plaque and kidney eQTLs in this study are the first reports for these tissues. We queried eQTLs from these tissues against non-anthropomorphic GWAS results in the GRASP database. Results are reported for kidney in [Additional file 16] and peripheral artery plaque in [Additional file 17]. Serum creatinine and creatinine estimated glomerular filtration rate are associated with rs835223 [52], which is also associated with DAB2 expression levels in kidney here (P < 1.4E-5). Antibodies in systemic lupus erythematosus (SLE) accumulate in tissues including the glomeruli of kidney. SNP rs7808907 is associated with IRF5 expression levels in kidney (P < 3.9E-13) and was previously associated anti-double stranded DNA autoantibody status in SLE [53].

Additional file 16. Kidney eQTLs reported in this study and association with GWAS traits (P < 5e-8). Kidney eQTLs reported in this study were queries against the NHLBI GRASP GWAS database for overlaps. All GWAS intersections are given and GWAS results with particular relevance to renal function (serum creatinine, SLE and eGFR) are highlighted.

Format: XLSX Size: 13KB Download fileOpen Data

Additional file 17. Peripheral plaque eQTLs reported in this study and association with GWAS traits (P < 5e-8). Plaque eQTLs reported in this study were queries against the NHLBI GRASP GWAS database for overlaps. All GWAS intersections are given and several associations with coronary artery disease and myocardial infarction are highlighted.

Format: XLSX Size: 38KB Download fileOpen Data

SNP rs2133189 was previously linked to coronary artery disease (CAD) susceptibility [54] and is strongly linked here to peripheral artery plaque expression levels of AIDA (P < 2.1E-20). Other peripheral plaque eQTLs for SNPs previously linked to CAD or myocardial infarction include BTN3A1 (rs6929846 eQTL P < 2.8E-07, myocardial infarction P < 3.5E-24 [55]), ZNF344 (rs4803750 eQTL P < 3.8E-05, atherogenic dyslipidemia P < 1.3E-33 [56]), NBEAL1 (rs6725887 eQTL P < 2.7E-06, CAD P < 1.1E-09 [57]), ENST00000318084 (rs10764881 eQTL P < 2.7E-05, CAD P < 1.4E-09 [58]).

Discussion

In this study, we systematically characterized and annotated eQTL results from 53 genome-wide gene expression GWAS datasets. Overall 19,038 genes had at least one eQTL significantly associated with their expression. Even if a substantial proportion of these represent false discoveries, a large proportion of human genes seem to have common genetic influences on their expression level, consistent with prior surveys using sensitive allelic specific expression methods [6,59]. Given that few studies have explicitly assessed genome-wide genetic effects on splicing and alternate isoforms in human tissues there likely remain many additional genetic effects on expression to be discovered. Regional cis-eQTLs predominate genome-wide over trans-eQTLs, though limitations in statistical and computational power have hampered trans-eQTL discovery and validation.

We identified many cis and several trans-eQTLs that have evidence for consistent association across more than one study or tissue. These human master cis- and trans-eQTLs may serve as potential positive controls in future studies and may reveal important aspects of regulatory interactions and human biology and evolution. Furthermore, future researchers searching for and claiming tissue-specific eQTLs could screen their results against the results we collated and deposited in the GRASP database to ensure there is no prior evidence in other tissues. The strong effects and common allele frequencies of these variants may also make them useful in sample forensics in expression-based research [60].

Ubiquitous cis-eQTLs were enriched for housekeeping genes consistent with a prior study [61] and for several biological categories including antigen presentation, mitochondrial function and S-glutathione transferase activity. We speculate these strong cis-eQTLs of common allele frequency could represent beneficial alleles arisen in human evolution that may enhance immune function, mitochondrial function and xenobiotic metabolism. Glutathione S-transferases are responsible for detoxification of many compounds and five such transcripts were found among strong cis-eQTLs (1p13.3: GSTM1, GSTM3, GSTM4, 22q11.23: GSTT1, 10q25.1: GSTO2). GSTM1 and GSTT1 have previously been reported to be subject to copy number variation influencing gene expression [62,63]. Results integrated across studies here reveal other members of the glutathione are subject to strong genetic regulation. Mitochondrial-associated transcripts were significantly enriched making up 12.1% of the cis-eGenes present in ≥25 datasets. These include genes that encode mitochondrial proteins involved in the electron transport chain and ATP synthesis (NDUFS5, COX7A2L, ATP5S), membrane functions (AKAP10, FECH, SURF1, TIMM10), transport (SLC25A16), and mitochondrial protein synthesis (MRPL19, MRPL21, MRPL43). While overall eQTL results were not enriched for overlap with selection features as defined by integrated haplotype scores or fixation index (FST), several of the master eQTL regions correspond with regions identified as containing human lineage-specific events [64]. These include CDK5RAP2 which appears to be under positive selection and may be involved in increased human brain size [65,66], and the SRGAP2 and NBPF gene cluster on chromosome 1 which demonstrates human lineage copy number increases and is suspected to play a role in increased neuronal branching in development [67-69].

We examined positional effects of eQTLs with respect to associated transcripts, regulatory features and across chromosomes. The strongest eQTLs cluster around their associated gene transcript regions, a pattern that appears universal across tissues and datasets, and is consistent with prior reports considering smaller numbers of tissues (e.g., [17]). A variety of regulatory features overlap eQTLs more than expected by chance, as others have also reported [70,71]. This is partially expected given gene co-centricity of these features and eQTLs. Features that lacked significant enrichment among eQTLs included microRNA coding regions and targets, human enhancer regions and non-coding RNAs. Thus, these features may account for a smaller proportion of functional genetic regulation of gene expression. This may be a property of more distant location from coding genes (i.e., enhancers, non-coding RNAs) but could also suggest less tolerance of functional variation in these features. Analysis across chromosomes reveals that chromosomes 21 and 22, in particular, display higher rates of cis-eQTLs after adjusting for a number of factors including gene number, coding length and number of variants. Notably, chromosomes 21 and 22 have been subject to major shifts in primate and human evolution [72].

Unlike the abundant cis-eQTLs, there appear to be few trans-eQTL hotspots across the genome. Many studies have chosen not to calculate long range cis- or trans-eQTL effects. Furthermore, given the large multiple testing burden discriminating true positives from false positives is challenging, particularly with limited statistical power, and if replication is not attempted. Homologous transcript mapping and cross-hybridization artifacts may also confound trans-eQTL discovery in some cases. Nonetheless, a few trans-acting regions have emerged with consistent evidence across a number of studies, including the HLA region (6p21.32), ARHGEF3 (3p14.3), the MAPT region (17q21.31), HBG (11p15.4), SUOX-IKZF4-RPS26 (12q13.2), and now RPLP2-PNPLA2 (11p15.5). Most of these regions have been implicated by human disease GWAS. Combining data across studies and tissues may help resolve mechanisms, key targets, and the extent of targeted expression networks. For example, our study suggests that RPS26-associated variants may be the key trans regulators at 12q13.2. Data from subcutaneous adipose included in the current study suggest rs4731702 near KLF14 (7q32.3) is associated in trans with SLC7A10 expression, which supports SLC7A10 as an important trans adipose target associated with metabolic traits as previously suggested [73]. Greater sample sizes may be needed to find and validate more trans-eQTLs, or the application of other approaches such as analysis of co-expressed modules [48], multi-species studies or addition of functional screens.

Prior studies suggested enrichment of eQTLs among some full GWAS scans and among topmost significant results. Here we examined a greater number of tissue eQTLs and GWAS results. Among 45 full human GWAS scans of disease and non-disease traits, we observe a consistent pattern whereby there is enrichment of eQTLs above and beyond nonsynonymous SNPs, and across the significant tail of the statistical distributions. This suggests that eQTLs contribute to the multi-genic nature of many complex human traits and may account for a greater proportion of variance than protein-coding variation [74]. In an analysis focused on strongest GWAS results from the NHGRI catalog we observe significant correlation between the strength of signal for GWAS and expression traits. Concordant strongest GWAS and eQTL SNPs establish a conservative floor indicating ~10% of GWAS phenotype signals are likely directly attributable to genetic regulation of expression. The true proportion of functional regulatory variants is likely much higher given functional alleles in LD, and incomplete coverage in the available eQTL results for variants and human populations, alternative splicing, non-coding RNAs, and tissue-specific expression. Overall these results imply that eQTLs will remain a critical component in interpreting genetic associations and prioritizing replication candidates for a variety of traits.

The addition of new tissue eQTLs may continue to suggest new mechanisms or reinforce prior hypotheses for functional variants. Here we report the first human kidney and plaque eQTLs. Kidney eQTLs corresponded with several prior kidney-related GWAS findings. Several findings of peripheral plaque eQTLs were for variants previously associated in GWAS of coronary artery disease or myocardial infarction. Notably, a prior study reported rs6929846 to be associated with myocardial infarction in a Japanese GWAS sample and replicated the finding in a subsequent Japanese sample [55]. Yamada et al. also provided evidence for rs6929846 transcriptional effects on BTN2A1 expression, and immunohistological positivity for BTN2A1 in human myocardial infarction lesions, and coronary endothelium, arterioles and capillaries [55]. Our study links the same SNP to expression levels of nearby BTN3A1 in peripheral artery plaque (P < 2.8E-7). This locus contains 6 butyrophilin genes and 1 butyrophilin pseudogene. The combination of these results suggests butyrophilin genes may play roles in coronary artery disease pathogenesis, possibly through roles in antigen presentation and T cell stimulation [75].

Beyond limitations in the analysis of trans-eQTLs this study has several significant limitations. The full gene expression-SNP datasets are generally unavailable, so the current catalog is limited by significant results available from individual studies, and probe annotations are often missing limiting precise localization and assessment of potential probe artifacts. The specific studies are biased mainly toward more readily available tissues, including blood, B-lymphoblastoid cell lines and brain autopsy tissues. Studies were further biased by their non-uniform transcript and genetic content and statistical power. Overall these limitations suggest the current database would most likely be prone to false negatives, thus lack of association at a specific locus cannot be viewed as definitive.

The decrease in the cost of genome-wide genotyping, sequencing and expression profiling means that larger sample sizes are increasingly feasible for eQTL studies. Applying RNA sequencing to eQTL studies may increase discoveries particularly with regard to genetically regulated alternative splicing [3,4]. While still in early stages, the study of additional RNA types such as long non-coding RNAs [76] and micro RNAs and their targets [77,78] and corresponding tissue-specific QTLs is leading to new insights. Deeper profiling of eQTLs via dense imputation with a modern 1000 Genomes based genetic map should increase eQTLs and improve fine mapping as recently demonstrated [79]. Profiling a greater proportion of human tissues as undertaken by the GTex project should further aid in defining tissue-specific eQTLs [80]. These are important goals since eQTLs seem to account for a significant proportion of human phenotypic and disease variability. Many areas require further study at the population level including detailed probing of extensive tissue and cell types, and ascertainment of QTLs related to splicing [4,24], RNA decay mechanisms [81], non-coding RNA [76,82], and epigenetic mechanisms such as methylation [28,83-85]. A deeper understanding of RNA-driven QTLs, whether cis or trans, tissue-specific or ubiquitous, coding or non-coding, splicing-, decay- or epigenetic-related may be critical to the interpretation of human phenotypic variability, in order to further disease risk prediction, understand causal mechanisms, and enable targeted therapies.

Conclusions

Expression QTLs inform the interpretation of human trait variability, and may account for a greater fraction of phenotypic variability than protein-coding variants. Our analysis of >50 eQTL datasets, in a more extensive set of tissues than previously characterized, highlights the gene centricity of eQTLs and their overlap with regulatory features, as well as their strong enrichment in significant GWAS results for a wide variety of traits. Novel trans-eQTLs are suggested by our study but overall their identification remains challenging. Using new eQTL data from kidney and peripheral plaque we note intersections with GWAS for renal and arterial disease associations which may suggest causal genes or functional mechanisms. This large-scale synthesis of available tissue eQTL data identifies many strong and relatively ubiquitous cis-eQTLs that could serve as positive controls in future studies. Our results also suggest some of these common and strong tissue-ubiquitous eQTLs may have adaptive origins in humans. Efforts to expand the genetic, splicing and tissue coverage of known eQTLs will provide further insights into human gene regulation.

Methods

Ethics statement

Approvals for published eQTL studies are described in their original publications. New eQTL samples (kidney, peripheral artery plaque) described in conjunction with this study were collected with written informed consent and under institutional approvals. For the kidney eQTL study ethical approval for the study was obtained from the Stanford University Institutional Review Board (IRB protocol 3941). That study was conducted according to the principles expressed in the Declaration of Helsinki. Multi-institutional approvals for the collection of peripheral artery plaque tissue were previously described [86].

Selection and collection of eQTL datasets

Many eQTL studies have been published in human and non-human species across a broad range of tissue and cell types. Early eQTL studies focused on the heritability and genetic basis of gene expression including several studies on lymphoblastoid cell lines used in the HapMap project. Several studies evaluated genetic variants related to drug response in cell lines. We focused our studies primarily on minimally altered human cells and tissues. Only one of the largest analyses of HapMap LCL samples was included here [27], and drug response, methylation, miRNA and non-human eQTL studies were excluded. Several published eQTL studies were not included since authors disclosed few results. Included studies, their citations and parameters are described in Table 1 and [Additional file 1]. The predominant tissue datasets are brain (n = 24 studies) and blood (n = 14), with other tissues including liver, adipose depots, kidney, skin, stomach and peripheral artery plaque. Previously unpublished data on kidney and peripheral artery plaque eQTLs are described in [Additional file 18]. Some previously published results were more extensively shared for the current analysis including liver, adipose and stomach [9], and lymphocytes [21].

Additional file 18. Supplemental methods description of eQTL analysis for novel data (kidney, peripheral plaque, HBTRC brain). Detailed methods and demographics for new eQTL analyses in included in this study.

Format: DOCX Size: 40KB Download fileOpen Data

Unifying eQTL and eGene annotations into a cross-dataset database

The workflow of the complete analysis is delineated in [Additional file 19]. We define genes whose expression levels are significantly associated with SNPs as eGenes. The term does not explicitly imply a specific transcript isoform since this information is often indeterminable with available data, but is likely to reflect expression variation in dominant gene isoforms. We refer to SNPs associated significantly in combination with an eGene as eQTLs (expression QTL SNPs). After we removed duplicate entries in some datasets, we used custom programs to map remaining identifiers either directly to unique NCBI Entrez Gene IDs, or via alias identifiers for heterogeneous gene names, in order to create a harmonized eGene dataset for further analysis. Only the strongest eQTL was kept for each eGene in each study in most subsequent analyses. Unified genomic locations (see Method below) for each eGene and eQTL in hg18/b36 reference were used to recalculate eQTL-eGene distances and direction (5′/- or 3′/+), and this dataset was used for subsequent analysis.

Additional file 19. Flow chart of overall study, data collection and annotation and analysis. Flow chart of overall study, data collection and annotation and analysis.

Format: DOC Size: 280KB Download file

This file can be viewed with: Microsoft Word ViewerOpen Data

Filtering of low quality SNPs and unification of SNP genomic coordinates

Studies either reported no SNP coordinates, or reported them in hg18 or hg19 frameworks. We mapped all of the SNP rsIDs reported in 53 datasets to dbSNP130 and used dbSNP reference genome mappings to obtain uniform genomic position for SNPs in hg18/Build 36.3. We removed SNPs which mapped to >1 location, or to the pseudo-autosomal region. For SNPs not initially mapped by this approach we checked for alias SNP identifiers to link to dbSNP130, and used the alias IDs when available to complete mapping. In this manner the majority of eQTLs were mapped to a single genomic position with high confidence.

Genomic locations for each gene boundary were retrieved from NCBI RefSeq 56 (GRCh36.3 assembly) using hg18/b36 reference. If multiple transcripts/isoforms are transcribed from the same genomic locus/gene region the maximal union of boundaries was used. Data were retrieved using the biomaRt package [87], available through the Bioconductor repository [88]. eQTLs ≤ 500 kb from associated eGenes were defined as cis. Those eQTLs > 500 kb were defined as trans, and further segmented into those being trans on the same or different chromosomes.

Summary of eGenes and eQTLs mapped to different categories

In total 419,796 eQTLs were reported from the 53 eQTL datasets. Among them, 359,268 eQTLs and their associated eGenes were mapped to RefSeq gene symbols or gene aliases, indicating both eQTL and eGene genomic positions in the RefSeq database. We selected the strongest eQTL per eGene per unique dataset yielding 116,563 best eQTLs (106,083 cis and 10,480 trans with the 500 kb threshold). Among these, there were 62,872 unique SNP identifiers that were the best eQTL in 1 or more dataset, for a total of 19,038 mapped eGenes.

Unsupervised hierarchical clustering

Unsupervised hierarchical clustering was used to assess patterns of regulatory variants across different tissues and cell types. Initially a 19,038 × 53 data matrix was constructed. Given the sparse nature of the matrix (most eGenes are unique to 1 study), we generated clusters based on eGenes present in higher proportions of studies (n = 15-53). The heatmap function in R 2.11 was used to do clustering with the Disfun parameter set to binary.

Comparison of eQTLs to NHGRI GWAS catalog

The NHGRI GWAS catalog (March-22-2013) was downloaded [89]. Expression SNPs strongly associated with the gene expression traits were cross-referenced with SNPs in the GWAS catalog. Two sets of eQTLs were compared (160,580 unique eQTLs and 62,872 unique best eQTLs) against two sets of SNPs derived from the GWAS catalog (8,845 unique SNPs and 40,573 unique SNPs plus those in tight LD (r^2 = 1 in CEU based on SNAP [42] queries)) yielding four pair-wise comparisons.

Enrichment of eQTLs over protein-coding SNPs in full GWA trait scans

Full GWA trait scan statistics (n = 45 scans) were identified as part of the NHLBI GRASP database [38] and downloaded. Genomic lambda values were calculated relative to the null expectation for the full GWA distributions [90]. Likewise, lambda values were calculated within each GWAS for expression SNPs from the current study (n = 62,872 best eSNPs) and nSNPs (based on dbSNP annotation, n = 100,601). Further lambda values were calculated restricted to those GWAS results with P < 1E-2. The ratios for enrichment were determined by comparing lambda values of eQTLs versus non-eQTLs, and nSNPs versus non-nSNPs. Komologorov-Smirnoff tests were applied to test differences in the distributions under each criterion. Individual lead cis-eQTLs and trans-eQTLs were directly assessed for presence in the GRASP database containing results from among 1,390 GWAS studies.

Comparison to human genome and regulatory features

We compared only the 62,872 unique best eQTLs to regulatory tracks. To take into account the different size of features (base pairs) reported by different tracks, for each regulatory track, the probability of any random base overlapping each track was calculated as the number of unique bases in each track divided by the total bases in the genome (3,080,436,451). Based on this probability, the expected number of overlaps between 62,872 single base position eQTLs and each track was computed. Binominal tests indicated whether observed overlaps were greater than expected by chance.

Regulatory tracks (B36 coordinates) were downloaded from the UCSC Genome Browser [91] or other sites. The 22 regulatory features include ENCODE histone modification sites, transcription factor and CTCF insulator sites in lymphoblastoid cell lines, ORegAnno (Open Regulatory Annotation) [92], predicted TFBS (UCSC conserved transcriptional factor binding sites), Vista Enhancers [93], human selection sites as determined by FST and IHS (integrated haplotype scores), human microRNAs (miRbase13)[94], TargetScan (predicted miRNA targets) [95], Patrocles (experimentally supported miRNA sites) [96], PolymiRTS (predicted SNP-miRNA binding sites) [97], UCSC functional RNAs (e.g., tRNA), UCSC CpG islands, long intergenic non-coding RNAs [98], and long-range 5C experiments in targeted ENCODE regions [50]. Specific top cis- and trans-eQTL SNPs were queried against ENCODE data using RegulomeDB [43].

The unique best cis-eQTLs were analyzed for differential representation by chromosomes. The total number of cis-eQTLs for each chromosome was divided by 4 distinct features to produce 4 rankings for enrichment: 1) total chromosome length (GRCh37.p11), 2) number of CCDS genes (release 11), 3) length of HuRef RNAs, and 4) number of HuRef variants. The chromosome rankings by the 4 metrics were averaged to produce an overall rank for over-representation of cis-eQTLs.

Housekeeping gene analysis

Housekeeping transcripts were defined based on previous analysis of 18 human tissues [51]. Within our dataset 2,207 eGenes were designated as housekeeping genes and 16,831 as non-housekeeping genes. Frequencies of each eGene across dataset were calculated for housekeeping and non-housekeeping genes and compared by Student’s t-test.

Availability of supporting data

The primary data for some of the eQTL studies is available in public repositories as described in the original reports. The summary level eQTL results data sets supporting the results of this article are largely available in the full download of the NHLBI Genome-wide Repository of Associations between SNPs and Phenotypes (GRASPdb) [Build 1.0, http://apps.nhlbi.nih.gov/grasp/ webcite] [99].

Competing interests

The authors declare they have no competing interests.

Authors’ contributions

Conception for overall database (ADJ). Construction, annotation and analysis of overall database (XZ, ADJ). Collected, analyzed and provided kidney eQTL data (HG, SK), lymphocyte eQTL data (HHG, JEC, MPJ, JB), brain eQTL data (RD, VE), carotid artery plaque eQTL data (AP, RD, VE), blood/adipose eQTL data (AP, VE). Provided key input to the overall design and analysis of the database (XZ, DL, ADJ). Wrote the paper (XZ, ADJ). Provided editing of the manuscript (ADJ, XZ, CJO, DL, VE, RD, HG). All authors read and approved the final manuscript.

Acknowledgements

XZ and ADJ were supported by NIH Intramural Funds. The authors acknowledge Heather E. Wheeler for contribution to the kidney eQTL data. The kidney eQTL work was supported by the Glenn Center for Aging. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). The GTEx datasets used for the analyses described in this manuscript were obtained from: GTEx Portal on 08/06/2013. Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI\SAIC-Frederick, Inc. (SAIC-F) subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through an SAIC-F subcontract to Van Andel Institute (10ST1035). Additional data repository and project management were provided by SAIC-F (HHSN261200800001E). The Brain Bank was supported by a supplement to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941), the University of Chicago (MH090951 & MH090937), the University of North Carolina - Chapel Hill (MH090936) and to Harvard University (MH090948).

References

  1. Cheung VG, Spielman RS: Genetics of human gene expression: mapping DNA variants that influence gene expression.

    Nat Rev Genet 2009, 10:595-604. OpenURL

  2. Montgomery SB, Dermitzakis ET: From expression QTLs to personalized transcriptomics.

    Nat Rev Genet 2011, 12:277-282. OpenURL

  3. Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET: Transcriptome genetics using second generation sequencing in a Caucasian population.

    Nature 2010, 464:773-777. OpenURL

  4. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing.

    Nature 2010, 464:768-772. OpenURL

  5. Chess A: Mechanisms and consequences of widespread random monoallelic expression.

    Nat Rev Genet 2012, 13:421-428. OpenURL

  6. Johnson AD, Zhang Y, Papp AC, Pinsonneault JK, Lim JE, Saffen D, Dai Z, Wang D, Sadee W: Polymorphisms affecting gene transcription and mRNA processing in pharmacogenetic candidate genes: detection through allelic expression imbalance in human target tissues.

    Pharmacogenet Genomics 2008, 18:781-791. OpenURL

  7. Rockman MV, Wray GA: Abundant raw material for cis-regulatory evolution in humans.

    Mol Biol Evol 2002, 19:1991-2004. OpenURL

  8. Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, Verwoert GC, Hwang SJ, Pihur V, Vollenweider P, O’Reilly PF, Amin N, Bragg-Gresham JL, Teumer A, Glazer NL, Launer L, Zhao JH, Aulchenko Y, Heath S, Sober S, Parsa A, Luan J, Arora P, Dehghan A, Zhang F, Lucas G, Hicks AA, Jackson AU, et al.: Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.

    Nature 2011, 478:103-109. OpenURL

  9. Greenawalt DM, Dobrin R, Chudin E, Hatoum IJ, Suver C, Beaulaurier J, Zhang B, Castro V, Zhu J, Sieberts SK, Wang S, Molony C, Heymsfield SB, Kemp DM, Reitman ML, Lum PY, Schadt EE, Kaplan LM: A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort.

    Genome Res 2011, 21:1008-1016. OpenURL

  10. Knight J, Barnes MR, Breen G, Weale ME: Using functional annotation for the empirical determination of Bayes Factors for genome-wide association study analysis.

    PLoS ONE 2011, 6:e14808. OpenURL

  11. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ: Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS.

    PLoS Genet 2010, 6:e1000888. OpenURL

  12. Tang W, Schwienbacher C, Lopez LM, Ben-Shlomo Y, Oudot-Mellakh T, Johnson AD, Samani NJ, Basu S, Gogele M, Davies G, Lowe GD, Tregouet DA, Tan A, Pankow JS, Tenesa A, Levy D, Volpato CB, Rumley A, Gow AJ, Minelli C, Yarnell JW, Porteous DJ, Starr JM, Gallacher J, Boerwinkle E, Visscher PM, Pramstaller PP, Cushman M, Emilsson V, Plump AS, et al.: Genetic associations for activated partial thromboplastin time and prothrombin time, their gene expression profiles, and risk of coronary artery disease.

    Am J Hum Genet 2012, 91:152-162. OpenURL

  13. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, Johansen CT, Fouchier SW, Isaacs A, Peloso GM, Barbalic M, Ricketts SL, Bis JC, Aulchenko YS, Thorleifsson G, Feitosa MF, Chambers J, Orho-Melander M, Melander O, Johnson T, Li X, Guo X, Li M, Shin CY, Jin GM, Jin KY, et al.: Biological, clinical and population relevance of 95 loci for blood lipids.

    Nature 2010, 466:707-713. OpenURL

  14. Innocenti F, Cooper GM, Stanaway IB, Gamazon ER, Smith JD, Mirkov S, Ramirez J, Liu W, Lin YS, Moloney C, Aldred SF, Trinklein ND, Schuetz E, Nickerson DA, Thummel KE, Rieder MJ, Rettie AE, Ratain MJ, Cox NJ, Brown CD: Identification, replication, and functional fine-mapping of expression quantitative trait loci in primary human liver tissue.

    PLoS Genet 2011, 7:e1002078. OpenURL

  15. Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Zhang B, Wang S, Suver C, Zhu J, Millstein J, Sieberts S, Lamb J, GuhaThakurta D, Derry J, Storey JD, vila-Campillo I, Kruger MJ, Johnson JM, Rohl CA, van Nas A, Mehrabian M, Drake TA, Lusis AJ, Smith RC, Guengerich FP, Strom SC, Schuetz E, Rushmore TH, et al.: Mapping the genetic architecture of gene expression in human liver.

    PLoS Biol 2008, 6:e107. OpenURL

  16. Schroder A, Klein K, Winter S, Schwab M, Bonin M, Zell A, Zanger UM: Genomics of ADME gene expression: mapping expression quantitative trait loci relevant for absorption, distribution, metabolism and excretion of drugs in human liver.

    Pharmacogenomics J 2011, 13:12-20. OpenURL

  17. Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, ttar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M, Nisbett J, Deloukas P, Dermitzakis ET, Antonarakis SE: Common regulatory variation impacts gene expression in a cell type-dependent manner.

    Science 2009, 325:1246-1250. OpenURL

  18. Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M, Lathrop GM, Abecasis GR, Cookson WO: A genome-wide association study of global gene expression.

    Nat Genet 2007, 39:1202-1207. OpenURL

  19. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, Gulcher JR, et al.: Genetics of gene expression and its effect on disease.

    Nature 2008, 452:423-428. OpenURL

  20. Fehrmann RS, Jansen RC, Veldink JH, Westra HJ, Arends D, Bonder MJ, Fu J, Deelen P, Groen HJ, Smolonska A, Weersma RK, Hofstra RM, Buurman WA, Rensen S, Wolfs MG, Platteel M, Zhernakova A, Elbers CC, Festen EM, Trynka G, Hofker MH, Saris CG, Ophoff RA, van den Berg LH, van Heel DA, Wijmenga C, Te Meerman GJ, Franke L: Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA.

    PLoS Genet 2011, 7:e1002197. OpenURL

  21. Goring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, MacCluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J: Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes.

    Nat Genet 2007, 39:1208-1216. OpenURL

  22. Grundberg E, Kwan T, Ge B, Lam KC, Koka V, Kindmark A, Mallmin H, Dias J, Verlaan DJ, Ouimet M, Sinnett D, Rivadeneira F, Estrada K, Hofman A, van Meurs JM, Uitterlinden A, Beaulieu P, Graziani A, Harmsen E, Ljunggren O, Ohlsson C, Mellstrom D, Karlsson MK, Nilsson O, Pastinen T: Population genomics in a disease targeted primary cell model.

    Genome Res 2009, 19:1942-1952. OpenURL

  23. Heap GA, Trynka G, Jansen RC, Bruinenberg M, Swertz MA, Dinesen LC, Hunt KA, Wijmenga C, Vanheel DA, Franke L: Complex nature of SNP genotype effects on gene expression in primary human leucocytes.

    BMC Med Genomics 2009, 2:1. OpenURL

  24. Heinzen EL, Ge D, Cronin KD, Maia JM, Shianna KV, Gabriel WN, Welsh-Bohmer KA, Hulette CM, Denny TN, Goldstein DB: Tissue-specific genetic control of splicing: implications for the study of complex traits.

    PLoS Biol 2008, 6:e1. OpenURL

  25. Idaghdour Y, Czika W, Shianna KV, Lee SH, Visscher PM, Martin HC, Miclaus K, Jadallah SJ, Goldstein DB, Wolfinger RD, Gibson G: Geographical genomics of human leukocyte gene expression variation in southern Morocco.

    Nat Genet 2010, 42:62-67. OpenURL

  26. Murphy A, Chu JH, Xu M, Carey VJ, Lazarus R, Liu A, Szefler SJ, Strunk R, Demuth K, Castro M, Hansel NN, Diette GB, Vonakis BM, Adkinson NF Jr, Klanderman BJ, Senter-Sylvia J, Ziniti J, Lange C, Pastinen T, Raby BA: Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes.

    Hum Mol Genet 2010, 19:4745-4757. OpenURL

  27. Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET: Population genomics of human gene expression.

    Nat Genet 2007, 39:1217-1224. OpenURL

  28. Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai SL, Arepalli S, Dillman A, Rafferty IP, Troncoso J, Johnson R, Zielke HR, Ferrucci L, Longo DL, Cookson MR, Singleton AB: Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain.

    PLoS Genet 2010, 6:e1000952. OpenURL

  29. Liu C, Cheng L, Badner JA, Zhang D, Craig DW, Redman M, Gershon ES: Whole-genome association mapping of gene expression in the human prefrontal cortex.

    Mol Psychiatry 2010, 15:779-784. OpenURL

  30. Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, Kaleem M, Leung D, Bryden L, Nath P, Zismann VL, Joshipura K, Huentelman MJ, Hu-Lince D, Coon KD, Craig DW, Pearson JV, Holmans P, Heward CB, Reiman EM, Stephan D, Hardy J: A survey of genetic human cortical gene expression.

    Nat Genet 2007, 39:1494-1499. OpenURL

  31. Zhang B, Gaiteri C, Bodea LG, Wang Z, McElwee J, Podtelezhnikov AA, Zhang C, Xie T, Tran L, Dobrin R, Fluder E, Clurman B, Melquist S, Narayanan M, Suver C, Shah H, Mahajan M, Gillis T, Mysore J, MacDonald ME, Lamb JR, Bennett DA, Molony C, Stone DJ, Gudnason V, Myers AJ, Schadt EE, Neumann H, Zhu J, Emilsson V: Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease.

    Cell 2013, 153:707-720. OpenURL

  32. Romanoski CE, Che N, Yin F, Mai N, Pouldar D, Civelek M, Pan C, Lee S, Vakili L, Yang WP, Kayne P, Mungrue IN, Araujo JA, Berliner JA, Lusis AJ: Network for activation of human endothelial cells by oxidized phospholipids: a critical role of heme oxygenase 1.

    Circ Res 2011, 109:e27-e41. OpenURL

  33. Ding J, Gudjonsson JE, Liang L, Stuart PE, Li Y, Chen W, Weichenthal M, Ellinghaus E, Franke A, Cookson W, Nair RP, Elder JT, Abecasis GR: Gene expression in skin and lymphoblastoid cells: refined statistical method reveals extensive overlap in cis-eQTL signals.

    Am J Hum Genet 2010, 87:779-789. OpenURL

  34. Gaffney DJ: Global properties and functional complexity of human gene regulatory variation.

    PLoS Genet 2013, 9:e1003501. OpenURL

  35. Bosse Y: Genome-wide expression quantitative trait loci analysis in asthma.

    Curr Opin Allergy Clin Immunol 2013, 13:487-494. OpenURL

  36. Nica AC, Parts L, Glass D, Nisbet J, Barrett A, Sekowska M, Travers M, Potter S, Grundberg E, Small K, Hedman AK, Bataille V, Tzenova Bell J, Surdulescu G, Dimas AS, Ingle C, Nestle FO, di Meglio P, Min JL, Wilk A, Hammond CJ, Hassanali N, Yang TP, Montgomery SB, O’Rahilly S, Lindgren CM, Zondervan KT, Soranzo N, Barroso I, Durbin R, et al.: The architecture of gene regulatory variation across multiple human tissues: the MuTHER study.

    PLoS Genet 2011, 7:e1002003. OpenURL

  37. Flutre T, Wen X, Pritchard J, Stephens M: A statistical framework for joint eQTL analysis in multiple tissues.

    PLoS Genet 2013, 9:e1003486. OpenURL

  38. NHLBI Genome-wide Repository of Associations between SNPs and Phenotypes (GRASPdb) [http://apps.nhlbi.nih.gov/grasp/ webcite] edition; 2014

  39. Zeller T, Wild P, Szymczak S, Rotival M, Schillert A, Castagne R, Maouche S, Germain M, Lackner K, Rossmann H, Eleftheriadis M, Sinning CR, Schnabel RB, Lubos E, Mennerich D, Rust W, Perret C, Proust C, Nicaud V, Loscalzo J, Hubner N, Tregouet D, Munzel T, Ziegler A, Tiret L, Blankenberg S, Cambien F: Genetics and beyond–the transcriptome of human monocytes and disease susceptibility.

    PLoS ONE 2010, 5:e10693. OpenURL

  40. Genotype Tissue-Expression Portal (GTex) [http://www.gtexportal.org/home/ webcite] edition; 2014

  41. Ramasamy A, Trabzuni D, Gibbs JR, Dillman A, Hernandez DG, Arepalli S, Walker R, Smith C, Ilori GP, Shabalin AA, Li Y, Singleton AB, Cookson MR, Hardy J, Ryten M, Weale ME: Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies.

    Nucleic Acids Res 2013, 41:e88. OpenURL

  42. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI: SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

    Bioinformatics 2008, 24:2938-2939. OpenURL

  43. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB.

    Genome Res 2012, 22:1790-1797. OpenURL

  44. Latourelle JC, Dumitriu A, Hadzi TC, Beach TG, Myers RH: Evaluation of Parkinson disease risk variants as expression-QTLs.

    PLoS ONE 2012, 7:e46199. OpenURL

  45. Shen Q, Wang X, Chen Y, Xu L, Wang X, Lu L: Expression QTL and regulatory network analysis of microtubule-associated protein tau gene.

    Parkinsonism Relat Disord 2009, 15:525-531. OpenURL

  46. Sankaran VG, Xu J, Ragoczy T, Ippolito GC, Walkley CR, Maika SD, Fujiwara Y, Ito M, Groudine M, Bender MA, Tucker PW, Orkin SH: Developmental and species-divergent globin switching are driven by BCL11A.

    Nature 2009, 460:1093-1097. OpenURL

  47. Tang XF, Zhang Z, Hu DY, Xu AE, Zhou HS, Sun LD, Gao M, Gao TW, Gao XH, Chen HD, Xie HF, Tu CX, Hao F, Wu RN, Zhang FR, Liang L, Pu XM, Zhang JZ, Han JW, Pan GP, Wu JQ, Li K, Su MW, Du WD, Zhang WJ, Liu JJ, Xiang LH, Yang S, Zhou YW, Zhang XJ: Association analyses identify three susceptibility Loci for vitiligo in the Chinese Han population.

    J Invest Dermatol 2013, 133:403-410. OpenURL

  48. Rotival M, Zeller T, Wild PS, Maouche S, Szymczak S, Schillert A, Castagne R, Deiseroth A, Proust C, Brocheton J, Godefroy T, Perret C, Germain M, Eleftheriadis M, Sinning CR, Schnabel RB, Lubos E, Lackner KJ, Rossmann H, Munzel T, Rendon A, Erdmann J, Deloukas P, Hengstenberg C, Diemert P, Montalescot G, Ouwehand WH, Samani NJ, Schunkert H, Tregouet DA, et al.: Integrating genome-wide genetic variations and monocyte expression data reveals trans-regulated gene modules in humans.

    PLoS Genet 2011, 7:e1002367. OpenURL

  49. Kent WJ: BLAT–the BLAST-like alignment tool.

    Genome Res 2002, 12:656-664. OpenURL

  50. Sanyal A, Lajoie BR, Jain G, Dekker J: The long-range interaction landscape of gene promoters.

    Nature 2012, 489:109-113. OpenURL

  51. Zhu J, He F, Song S, Wang J, Yu J: How many human genes can be defined as housekeeping with current expression data?

    BMC Genomics 2008, 9:172. OpenURL

  52. Kottgen A, Pattaro C, Boger CA, Fuchsberger C, Olden M, Glazer NL, Parsa A, Gao X, Yang Q, Smith AV, O’Connell JR, Li M, Schmidt H, Tanaka T, Isaacs A, Ketkar S, Hwang SJ, Johnson AD, Dehghan A, Teumer A, Pare G, Atkinson EJ, Zeller T, Lohman K, Cornelis MC, Probst-Hensch NM, Kronenberg F, Tonjes A, Hayward C, Aspelund T, et al.: New loci associated with kidney function and chronic kidney disease.

    Nat Genet 2010, 42:376-384. OpenURL

  53. Chung SA, Taylor KE, Graham RR, Nititham J, Lee AT, Ortmann WA, Jacob CO, Alarcon-Riquelme ME, Tsao BP, Harley JB, Gaffney PM, Moser KL, Petri M, Demirci FY, Kamboh MI, Manzi S, Gregersen PK, Langefeld CD, Behrens TW, Criswell LA: Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production.

    PLoS Genet 2011, 7:e1001323. OpenURL

  54. Erdmann J, Grosshennig A, Braund PS, Konig IR, Hengstenberg C, Hall AS, Linsel-Nitschke P, Kathiresan S, Wright B, Tregouet DA, Cambien F, Bruse P, Aherrahrou Z, Wagner AK, Stark K, Schwartz SM, Salomaa V, Elosua R, Melander O, Voight BF, O’Donnell CJ, Peltonen L, Siscovick DS, Altshuler D, Merlini PA, Peyvandi F, Bernardinelli L, Ardissino D, Schillert A, Blankenberg S, et al.: New susceptibility locus for coronary artery disease on chromosome 3q22.3.

    Nat Genet 2009, 41:280-282. OpenURL

  55. Yamada Y, Nishida T, Ichihara S, Sawabe M, Fuku N, Nishigaki Y, Aoyagi Y, Tanaka M, Fujiwara Y, Yoshida H, Shinkai S, Satoh K, Kato K, Fujimaki T, Yokoi K, Oguri M, Yoshida T, Watanabe S, Nozawa Y, Hasegawa A, Kojima T, Han BG, Ahn Y, Lee M, Shin DJ, Lee JH, Jang Y: Association of a polymorphism of BTN2A1 with myocardial infarction in East Asian populations.

    Atherosclerosis 2011, 215:145-152. OpenURL

  56. Avery CL, He Q, North KE, Ambite JL, Boerwinkle E, Fornage M, Hindorff LA, Kooperberg C, Meigs JB, Pankow JS, Pendergrass SA, Psaty BM, Ritchie MD, Rotter JI, Taylor KD, Wilkens LR, Heiss G, Lin DY: A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains.

    PLoS Genet 2011, 7:e1002322. OpenURL

  57. Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, Preuss M, Stewart AF, Barbalic M, Gieger C, Absher D, Aherrahrou Z, Allayee H, Altshuler D, Anand SS, Andersen K, Anderson JL, Ardissino D, Ball SG, Balmforth AJ, Barnes TA, Becker DM, Becker LC, Berger K, Bis JC, Boekholdt SM, Boerwinkle E, Braund PS, Brown MJ, Burnett MS, et al.: Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.

    Nat Genet 2011, 43:333-338. OpenURL

  58. Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

    Nature 2007, 447:661-678. OpenURL

  59. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A: Widespread monoallelic expression on human autosomes.

    Science 2007, 318:1136-1140. OpenURL

  60. Westra HJ, Jansen RC, Fehrmann RS, Te Meerman GJ, van Heel D, Wijmenga C, Franke L: MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects.

    Bioinformatics 2011, 27:2104-2111. OpenURL

  61. Powell JE, Henders AK, McRae AF, Wright MJ, Martin NG, Dermitzakis ET, Montgomery GW, Visscher PM: Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent.

    Genome Res 2012, 22:456-466. OpenURL

  62. Moyer AM, Salavaggione OE, Hebbring SJ, Moon I, Hildebrandt MA, Eckloff BW, Schaid DJ, Wieben ED, Weinshilboum RM: Glutathione S-transferase T1 and M1: gene sequence variation and functional genomics.

    Clin Cancer Res 2007, 13:7207-7216. OpenURL

  63. Zhao Y, Marotta M, Eichler EE, Eng C, Tanaka H: Linkage disequilibrium between two high-frequency deletion polymorphisms: implications for association studies involving the glutathione-S transferase (GST) genes.

    PLoS Genet 2009, 5:e1000472. OpenURL

  64. O’Bleness M, Searles VB, Varki A, Gagneux P, Sikela JM: Evolution of genetic and genomic features unique to the human lineage.

    Nat Rev Genet 2012, 13:853-866. OpenURL

  65. Evans PD, Vallender EJ, Lahn BT: Molecular evolution of the brain size regulator genes CDK5RAP2 and CENPJ.

    Gene 2006, 375:75-79. OpenURL

  66. Rimol LM, Agartz I, Djurovic S, Brown AA, Roddey JC, Kahler AK, Mattingsdal M, Athanasiu L, Joyner AH, Schork NJ, Halgren E, Sundet K, Melle I, Dale AM, Andreassen OA: Sex-dependent association of common variants of microcephaly genes with brain structure.

    Proc Natl Acad Sci U S A 2010, 107:384-388. OpenURL

  67. Charrier C, Joshi K, Coutinho-Budd J, Kim JE, Lambert N, de Marchena J, Jin WL, Vanderhaeghen P, Ghosh A, Sassa T, Polleux F: Inhibition of SRGAP2 function by its human-specific paralogs induces neoteny during spine maturation.

    Cell 2012, 149:923-935. OpenURL

  68. Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, Nefedov M, Rosenfeld JA, Sajjadian S, Malig M, Kotkiewicz H, Curry CJ, Shafer S, Shaffer LG, de Jong PJ, Wilson RK, Eichler EE: Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication.

    Cell 2012, 149:912-922. OpenURL

  69. Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T, Karimpour-Fard A, Glueck D, McGavran L, Berry R, Pollack J, Sikela JM: Lineage-specific gene duplication and loss in human and great ape evolution.

    PLoS Biol 2004, 2:E207. OpenURL

  70. Gaffney DJ, Veyrieras JB, Degner JF, Pique-Regi R, Pai AA, Crawford GE, Stephens M, Gilad Y, Pritchard JK: Dissecting the regulatory architecture of gene expression QTLs.

    Genome Biol 2012, 13:R7. OpenURL

  71. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, et al.: Systematic localization of common disease-associated variation in regulatory DNA.

    Science 2012, 337:1190-1195. OpenURL

  72. Holmquist GP, Wienberg J: Human Chromosome Evolution. Chichester: Wiley; 2008. OpenURL

  73. Small KS, Hedman AK, Grundberg E, Nica AC, Thorleifsson G, Kong A, Thorsteindottir U, Shin SY, Richards HB, Soranzo N, Ahmadi KR, Lindgren CM, Stefansson K, Dermitzakis ET, Deloukas P, Spector TD, McCarthy MI: Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes.

    Nat Genet 2011, 43:561-564. OpenURL

  74. Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM: Personal and population genomics of human regulatory variation.

    Genome Res 2012, 22:1689-1697. OpenURL

  75. Vavassori S, Kumar A, Wan GS, Ramanjaneyulu GS, Cavallari M, El DS, Beddoe T, Theodossis A, Williams NK, Gostick E, Price DA, Soudamini DU, Voon KK, Olivo M, Rossjohn J, Mori L, De LG: Butyrophilin 3A1 binds phosphorylated antigens and stimulates human gammadelta T cells.

    Nat Immunol 2013, 14:908-916. OpenURL

  76. Kumar V, Westra HJ, Karjalainen J, Zhernakova DV, Esko T, Hrdlickova B, Almeida R, Zhernakova A, Reinmaa E, Vosa U, Hofker MH, Fehrmann RS, Fu J, Withoff S, Metspalu A, Franke L, Wijmenga C: Human disease-associated genetic variation impacts large intergenic non-coding RNA expression.

    PLoS Genet 2013, 9:e1003201. OpenURL

  77. Gamazon ER, Ziliak D, Im HK, LaCroix B, Park DS, Cox NJ, Huang RS: Genetic architecture of microRNA expression: implications for the transcriptome and complex traits.

    Am J Hum Genet 2012, 90:1046-1063. OpenURL

  78. Rantalainen M, Herrera BM, Nicholson G, Bowden R, Wills QF, Min JL, Neville MJ, Barrett A, Allen M, Rayner NW, Fleckner J, McCarthy MI, Zondervan KT, Karpe F, Holmes CC, Lindgren CM: MicroRNA expression in abdominal and gluteal adipose tissue is associated with mRNA expression levels and partly genetically driven.

    PLoS ONE 2011, 6:e27338. OpenURL

  79. Liang L, Morar N, Dixon AL, Lathrop GM, Abecasis GR, Moffatt MF, Cookson WO: A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines.

    Genome Res 2013, 23:716-726. OpenURL

  80. GTEx Consortium: The genotype-tissue expression (GTEx) project.

    Nat Genet 2013, 45:580-585. OpenURL

  81. Pai AA, Cain CE, Mizrahi-Man O, De LS, Lewellen N, Veyrieras JB, Degner JF, Gaffney DJ, Pickrell JK, Stephens M, Pritchard JK, Gilad Y: The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels.

    PLoS Genet 2012, 8:e1003000. OpenURL

  82. Zhernakova DV, de Klerk E, Westra HJ, Mastrokolias A, Amini S, Ariyurek Y, Jansen R, Penninx BW, Hottenga JJ, Willemsen G, de Geus EJ, Boomsma DI, Veldink JH, van den Berg LH, Wijmenga C, den Dunnen JT, van Ommen GJ, ‘t Hoen PA, Franke L: DeepSAGE reveals genetic variants associated with alternative polyadenylation and expression of coding and non-coding transcripts.

    PLoS Genet 2013, 9:e1003594. OpenURL

  83. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y, Pritchard JK: DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines.

    Genome Biol 2011, 12:R10. OpenURL

  84. Bell JT, Tsai PC, Yang TP, Pidsley R, Nisbet J, Glass D, Mangino M, Zhai G, Zhang F, Valdes A, Shin SY, Dempster EL, Murray RM, Grundberg E, Hedman AK, Nica A, Small KS, Dermitzakis ET, McCarthy MI, Mill J, Spector TD, Deloukas P: Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population.

    PLoS Genet 2012, 8:e1002629. OpenURL

  85. Schalkwyk LC, Meaburn EL, Smith R, Dempster EL, Jeffries AR, Davies MN, Plomin R, Mill J: Allelic skewing of DNA methylation is widespread across the genome.

    Am J Hum Genet 2010, 86:196-212. OpenURL

  86. Puig O, Yuan J, Stepaniants S, Zieba R, Zycband E, Morris M, Coulter S, Yu X, Menke J, Woods J, Chen F, Ramey DR, He X, O’Neill EA, Hailman E, Johns DG, Hubbard BK, Yee LP, Wright SD, Desouza MM, Plump A, Reiser V: A gene expression signature that classifies human atherosclerotic plaque by relative inflammation status.

    Circ Cardiovasc Genet 2011, 4:595-604. OpenURL

  87. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.

    Bioinformatics 2005, 21:3439-3440. OpenURL

  88. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics.

    Genome Biol 2004, 5:R80. OpenURL

  89. NHGRI GWAS catalog [http://www.genome.gov/26525384 webcite] edition; 2014

  90. Devlin B, Roeder K: Genomic control for association studies.

    Biometrics 1999, 55:997-1004. OpenURL

  91. UCSC Genome Browser [http://genome.ucsc.edu/ webcite] edition; 2014

  92. Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJ: ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation.

    Bioinformatics 2006, 22:637-640. OpenURL

  93. Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, Plajzer-Frick I, Akiyama J, De VS, Afzal V, Black BL, Couronne O, Eisen MB, Visel A, Rubin EM: In vivo enhancer analysis of human conserved non-coding sequences.

    Nature 2006, 444:499-502. OpenURL

  94. miRBase [http://www.mirbase.org/ webcite] edition; 2014

  95. Target Scan [http://www.targetscan.org/ webcite] edition; 2014

  96. Hiard S, Charlier C, Coppieters W, Georges M, Baurain D: Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates.

    Nucleic Acids Res 2010, 38:D640-D651. OpenURL

  97. PolymiRTS [http://compbio.uthsc.edu/miRSNP/ webcite] edition; 2014

  98. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.

    Genes Dev 2011, 25:1915-1927. OpenURL

  99. Leslie R, O’Donnell CJ, Johnson AD: GRASP: analysis of genotype-phenotype results from 1,390 genome-wide association studies and corresponding open access database.

    Bioinformatics 2014, 30:i185-i194. OpenURL