Open Access Research article

Implications of human genome structural heterogeneity: functionally related genes tend to reside in organizationally similar genomic regions

Arnon Paz, Svetlana Frenkel, Sagi Snir, Valery Kirzhner and Abraham B Korol*

Author Affiliations

Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel

For all author emails, please log on.

BMC Genomics 2014, 15:252  doi:10.1186/1471-2164-15-252

Published: 31 March 2014

Abstract

Background

In an earlier study, we hypothesized that genomic segments with different sequence organization patterns (OPs) might display functional specificity despite their similar GC content. Here we tested this hypothesis by dividing the human genome into 100 kb segments, classifying these segments into five compositional groups according to GC content, and then characterizing each segment within the five groups by oligonucleotide counting (k-mer analysis; also referred to as compositional spectrum analysis, or CSA), to examine the distribution of sequence OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and non-coding parts the latter being much more abundant in the genome than the former.

Results

We identified 38 OP-type clusters of segments that differ in their compositional spectrum (CS) organization. Many of the segments that shared the same OP type were enriched with genes related to the same biological processes (developmental, signaling, etc.), components of biochemical complexes, or organelles. Thirteen OP-type clusters showed significant enrichment in genes connected to specific gene-ontology terms. Some of these clusters seemed to reflect certain events during periods of horizontal gene transfer and genome expansion, and subsequent evolution of genomic regions requiring coordinated regulation.

Conclusions

There may be a tendency for genes that are involved in the same biological process, complex or organelle to use the same OP, even at a distance of ~ 100 kb from the genes. Although the intergenic DNA is non-coding, the general pattern of sequence organization (e.g., reflected in over-represented oligonucleotide “words”) may be important and were protected, to some extent, in the course of evolution.

Keywords:
Compositional spectra analysis; Sequence organization pattern; Horizontal gene transfer; Whole genome duplication