Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa

Eran Elhaik12, Matteo Pellegrini3 and Tatiana V Tatarinova4*

Author Affiliations

1 Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205 USA

2 Department of Animal and Plant Sciences, University of Sheffield, Western Bank, Sheffield S10 2TN, UK

3 Molecular, Cell, and Developmental Biology, University of California, 610 Charles Young Drive East, Los Angeles, CA 90095, USA

4 Children’s Hospital Los Angeles, Keck School of Medicine, University of Southern California, 4650 Sunset Blvd, Los Angeles, CA 90027, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:23  doi:10.1186/1471-2105-15-23

Published: 21 January 2014



The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC3), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data.


We first derive equations to describe the relationship between gene methylation levels, GC3, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson’s correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included.


Gene expression variation can be used as predictors of gene methylation levels.

DNA methylation; Gene expression; GC3; Prediction; Oryza sativa