Table 1 |
||||||||
|
Comparison of nucleosome occupancy prediction models on different data sets |
||||||||
|
Model |
Summary |
Performance (Pearson R) |
Correlation with %G+C (Yeast, 150 bp windows) |
|||||
|
|
||||||||
|
Synthetic oligonucleotides (Microarray) [8] |
Synthetic oligonucleotides (Sequencing) [8] |
Yeast in vitro [8] |
Yeast in vivo [2] |
C. elegans adjusted nucleosome coverage [34] |
C. elegans normalized occupancy [34] |
|||
|
|
||||||||
|
Kaplan et al., 2009[8] |
Probabilistic model based on in vitro 5-mer preferences and periodic dinucleotide signal. |
0.51* |
0.45* |
0.89* |
0.34 |
0.47* |
0.61* |
0.87 |
|
|
||||||||
|
Lasso model (this study) |
See Methods. |
0.44 |
0.41 |
0.86* |
0.38* |
0.49* |
0.66* |
0.85 |
|
|
||||||||
|
Field et al., 2008[24] |
Probabilistic model based on 5-mer preferences measured in vivo (yeast) and periodic dinucleotide signals. |
0.47* |
0.45* |
0.74 |
0.39* |
0.46* |
0.61* |
0.64 |
|
|
||||||||
|
%G+C |
The percentage of guanine and cytosine bases in a DNA sequence. |
0.53* |
0.49* |
0.78* |
0.25 |
0.42 |
0.47 |
1 |
|
|
||||||||
|
Lasso model[2] |
Linear regression model trained on in vivo nucleosome occupancy data. Uses DNA structural parameters, excluding sequences and transcription factor binding sites (ABF1, REB1, and STB2) as inputs. |
0.23 |
0.22 |
0.63 |
0.45* |
0.38 |
0.5 |
0.55 |
|
|
||||||||
|
Peckham et al., 2007[25] |
SVM classifier trained on overrepresented k-mers (k = 1-6) found in nucleosome occupied and depleted sequences determined in vivo yeast data. |
0.43 |
0.39 |
0.48 |
0.22 |
0.29 |
0.33 |
0.57 |
|
|
||||||||
|
Yuan and Liu, 2008[26] |
Computes predicted nucleosome occupancy based on periodic dinucleotide signals found in nucleosomal and linker DNA sequences determined from in vitro and in vivo experiments in yeast |
0.02 |
0.05 |
0.35 |
0.27 |
0.36 |
0.48 |
0.30 |
|
|
||||||||
|
Miele et al., 2008[29] |
Computes free energy landscape of nucleosome formation using an estimation of dinucleotide-dependent DNA flexibility and intrinsic curvature. |
0.32 |
0.26 |
0.38 |
0.22 |
0.21 |
0.25 |
0.49 |
|
|
||||||||
|
Segal et al., 2006[23] Downloaded January 2007 |
Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from in vitro selection experiments. |
NaN |
NaN |
0.05 |
0.09 |
0.05 |
0.05 |
0.07 |
|
|
||||||||
|
Ioshikhes et al., 2006[22] |
Computes the correlation of periodic AA/TT dinucleotide motifs in a given sequence with those found in a set of 204 eukaryotic and viral nucleosomal sequences determined through in vivo and in vitro experiments[20]. |
-0.03 |
-0.03 |
0.01 |
0.07 |
-0.03 |
-0.01 |
0.01 |
|
|
||||||||
|
Estimates the dinucleotide-dependent cost of deformation caused by threading a given sequence on a template comprising the path of DNA found on the experimentally determined structure of the nucleosome core particle. |
0.01 |
0.004 |
0 |
-0.001 |
-0.001 |
-0.001 |
-0.0003 |
|
|
|
||||||||
|
Segal et al., 2006[23] Downloaded August 2009 |
Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from in vitro selection experiments. |
NaN |
NaN |
-0.2 |
0.001 |
-0.06 |
-0.05 |
-0.21 |
|
|
||||||||
|
Pearson correlation is shown as a performance metric. Nucleosome occupancy was predicted in yeast using only sequence from the test set (chr10-16) and chromosome III in C. elegans. "NaN" indicates that a score of "0" was obtained for each sequence (since this model[23] requires the sequence be > 150 bp in length). Models are sorted by their average rank in performance. Asterisks (*) and text in bold denote the top three and top 50% performing models for each data set, respectively. |
||||||||
|
Tillo and Hughes BMC Bioinformatics 2009 10:442 doi:10.1186/1471-2105-10-442 |
||||||||