Table 1

Comparison of nucleosome occupancy prediction models on different data sets

Model

Summary

Performance (Pearson R)

Correlation with %G+C (Yeast, 150 bp windows)


Synthetic oligonucleotides (Microarray) [8]

Synthetic oligonucleotides (Sequencing) [8]

Yeast in vitro [8]

Yeast in vivo [2]

C. elegans adjusted nucleosome coverage [34]

C. elegans normalized occupancy [34]


Kaplan et al., 2009[8]

Probabilistic model based on in vitro 5-mer preferences and periodic dinucleotide signal.

0.51*

0.45*

0.89*

0.34

0.47*

0.61*

0.87


Lasso model (this study)

See Methods.

0.44

0.41

0.86*

0.38*

0.49*

0.66*

0.85


Field et al., 2008[24]

Probabilistic model based on 5-mer preferences measured in vivo (yeast) and periodic dinucleotide signals.

0.47*

0.45*

0.74

0.39*

0.46*

0.61*

0.64


%G+C

The percentage of guanine and cytosine bases in a DNA sequence.

0.53*

0.49*

0.78*

0.25

0.42

0.47

1


Lasso model[2]

Linear regression model trained on in vivo nucleosome occupancy data. Uses DNA structural parameters, excluding sequences and transcription factor binding sites (ABF1, REB1, and STB2) as inputs.

0.23

0.22

0.63

0.45*

0.38

0.5

0.55


Peckham et al., 2007[25]

SVM classifier trained on overrepresented k-mers (k = 1-6) found in nucleosome occupied and depleted sequences determined in vivo yeast data.

0.43

0.39

0.48

0.22

0.29

0.33

0.57


Yuan and Liu, 2008[26]

Computes predicted nucleosome occupancy based on periodic dinucleotide signals found in nucleosomal and linker DNA sequences determined from in vitro and in vivo experiments in yeast

0.02

0.05

0.35

0.27

0.36

0.48

0.30


Miele et al., 2008[29]

Computes free energy landscape of nucleosome formation using an estimation of dinucleotide-dependent DNA flexibility and intrinsic curvature.

0.32

0.26

0.38

0.22

0.21

0.25

0.49


Segal et al., 2006[23]

Downloaded January 2007

Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from in vitro selection experiments.

NaN

NaN

0.05

0.09

0.05

0.05

0.07


Ioshikhes et al., 2006[22]

Computes the correlation of periodic AA/TT dinucleotide motifs in a given sequence with those found in a set of 204 eukaryotic and viral nucleosomal sequences determined through in vivo and in vitro experiments[20].

-0.03

-0.03

0.01

0.07

-0.03

-0.01

0.01


Tolstorukov et al., 2007,2008[31,32]

Estimates the dinucleotide-dependent cost of deformation caused by threading a given sequence on a template comprising the path of DNA found on the experimentally determined structure of the nucleosome core particle.

0.01

0.004

0

-0.001

-0.001

-0.001

-0.0003


Segal et al., 2006[23]

Downloaded August 2009

Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from in vitro selection experiments.

NaN

NaN

-0.2

0.001

-0.06

-0.05

-0.21


Pearson correlation is shown as a performance metric. Nucleosome occupancy was predicted in yeast using only sequence from the test set (chr10-16) and chromosome III in C. elegans. "NaN" indicates that a score of "0" was obtained for each sequence (since this model[23] requires the sequence be > 150 bp in length). Models are sorted by their average rank in performance. Asterisks (*) and text in bold denote the top three and top 50% performing models for each data set, respectively.

Tillo and Hughes BMC Bioinformatics 2009 10:442   doi:10.1186/1471-2105-10-442

Open Data