Open Access Highly Accessed Research article

Cis-regulatory variations: A study of SNPs around genes showing cis-linkage in segregating mouse populations

Debraj GuhaThakurta1*, Tao Xie1, Manish Anand14, Stephen W Edwards1, Guoya Li2, Susanna S Wang3 and Eric E Schadt1*

Author Affiliations

1 Genetics, Rosetta Inpharmatics LLC, a wholly owned subsidiaryof Merck & Co., Inc. 401 Terry Avenue North, Seattle, WA 98109, USA

2 Informatics, Rosetta Inpharmatics LLC, a wholly owned subsidiary of Merck & Co., Inc. 401 Terry Avenue North, Seattle, WA 98109, USA

3 Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA 90095-1679, USA

4 Microsoft Corporation, One Microsoft Way, Redmond, WA 98052-6399, USA

For all author emails, please log on.

BMC Genomics 2006, 7:235  doi:10.1186/1471-2164-7-235

Published: 15 September 2006



Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene

oci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites.


We demonstrate that the

QTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes.


This present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area.