Open Access Highly Accessed Open Badges Research article

cis sequence effects on gene expression

Andrew W Bergen12*, Andrea Baccarelli34, Timothy K McDaniel5, Kenneth Kuhn5, Ruth Pfeiffer1, Jerry Kakol5, Patrick Bender6, Kevin Jacobs17, Bernice Packer178, Stephen J Chanock17 and Meredith Yeager178

Author Affiliations

1 Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

2 Center for Health Sciences, Policy Division, SRI International, Menlo Park, CA USA

3 School of Public Health, Harvard University, Boston, MA USA

4 Molecular Epidemiology and Genetics, EPOCA Epidemiology Center, Maggiore Hospital, Mangiagalli and Regina Elena IRCCS Foundation & University of Milan, Milan, Italy

5 Illumina, San Diego, CA USA

6 Coriell Institute for Medical Research, Camden, NJ USA

7 Core Genotyping Facility, National Cancer Institute, Gaithersburg, MD USA

8 Science Applications International Corporation-National Cancer Institute (NCI), NCI-FCRDC, Frederick, MD USA

For all author emails, please log on.

BMC Genomics 2007, 8:296  doi:10.1186/1471-2164-8-296

Published: 29 August 2007



Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics) provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects) in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature.


We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning) to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p < 0.05) association with gene expression. Using the literature as a "gold standard" to compare 14 genes with data from both this study and the literature, we observed 80% and 85% concordance for genes exhibiting and not exhibiting significant cis sequence effects in our study, respectively.


Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.