Cis-regulatory variations: A study of SNPs around genes showing cis-linkage in segregating mouse populations
1 Genetics, Rosetta Inpharmatics LLC, a wholly owned subsidiaryof Merck & Co., Inc. 401 Terry Avenue North, Seattle, WA 98109, USA
2 Informatics, Rosetta Inpharmatics LLC, a wholly owned subsidiary of Merck & Co., Inc. 401 Terry Avenue North, Seattle, WA 98109, USA
3 Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA 90095-1679, USA
4 Microsoft Corporation, One Microsoft Way, Redmond, WA 98052-6399, USA
BMC Genomics 2006, 7:235 doi:10.1186/1471-2164-7-235Published: 15 September 2006
Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene expression quantitative trait loci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites.
We demonstrate that the cis-eQTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes.
This present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area.