Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: International Workshop on Computational Systems Biology: Approaches to Analysis of Genome Complexity and Regulatory Gene Networks

Open Access Research

Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome

Vladimir A Kuznetsov1*, Onkar Singh2 and Piroon Jenjaroenpun1

Author Affiliations

1 Department of Genome and Gene Expression Data Analysis, Bioinformatics Institute, 30 Biopolis str #07-01, Singapore, 138671

2 Laboratory of Clinical Pharmacology, Division of Medical Sciences, National Cancer Centre, 11 Hospital Drive, Singapore 169610

For all author emails, please log on.

BMC Genomics 2010, 11(Suppl 1):S12  doi:10.1186/1471-2164-11-S1-S12

Published: 10 February 2010

Additional files

Additional file 1:

GDP function fitting and extrapolation in noisy events for Esrrb TF library. Empirical relative frequency distribution of peak height intensities for Esrrb is fitted by GDP. Log-log plot: frequency of peak height intensity for the Esrrb library; solid circles: observed frequencies for cut off 12; solid line: best fit GDP function with parameters k = 2.40 ± 0.0778, b = 10.42 ± 0.6828. Extrapolated graph with the same parameters to get the predicted TFBSs in noise enriched binding events of library.

Format: PDF Size: 13KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

K-W model fits on the observed and best-fit GDP-derived data and calculates p0. Vertical dotted lines are representing qPCR experimental threshold and Improved Model threshold. Table 1 is representing the parameters of the K-W model fitting for all TFs.

Format: PDF Size: 191KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Fitting statistics of the GDP model to the empirical frequency distribution of binding events. t[k], t[b] are t test value for k, and b respectively. p[k] and p[b] are the p values. F is Fisher criterion (by SigmaPlot)

Format: XLS Size: 30KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

The numbers of TFBS-specific DNA fragments according to different specificity thresholds. Observed and best-fit GDP function predicted numbers of the DNA fragments are compared.

Format: XLS Size: 25KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Mutual agreement best-fit K-W and GDP functions. the both functions provide an accurately estimation of the number of specific ChIP-seq DNA fragments in reliably-defined TF binding sites.

Format: XLS Size: 23KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Venn diagrams of number of E-boxes co-localization in ChIP-seq defined binding loci. A: Venn diagram of number of E-boxes positive loci found in vicinity ± 150 bp of the centre of ChIP-seq defined binding loci. B: Venn diagram of number of E-boxes positive loci found in vicinity ± 250 bp of the centre of ChIP-seq defined binding loci.

Format: PDF Size: 111KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Number of c-Myc binding loci containing E-boxes in different peak height. A: Number of c-Myc binding loci (± 150 bp from c-Myc loci center) containing E-boxes. B: Number of c-Myc binding loci (± 250 bp from c-Myc loci center) containing E-boxes.

Format: XLS Size: 31KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Validation of ChIP-seq defined c-Myc binding loci based on localization of c-Myc binding loci, E-boxes and putative promoter of genes in mouse genome. A: Number of c-Myc BSs containing E-boxes and not containing E-boxes around TSS ± 1 kb and Number of genes which have c-Myc BSs containing E-boxes and not containing E-boxes. B: Number of BSs which are around TSS ± 1 kb and contain E-box. The number was separated in two groups; 1: relative low avidity (peak height 7-8) and 2: moderate- and high-avidity (peak height 9+)

Format: XLS Size: 31KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Number of E-boxes found within c-Myc binding loci in different peak height. A: Number of E-boxes found within c-Myc binding loci (± 150 bp of center). B: Number of motifs found in c-Myc binding loci (± 250 bp of center).

Format: XLS Size: 31KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

TF-gene association scores. High-, moderate-, and relatively low avidity c-Myc binding loci with multiple E-boxes in putative promoter region of Wee1, Npm3, and Fkbp5, respectively. Gene enrichment classes: I - gene enriched with binding site for Nanog, Oct4, Sox2, Smad1, and STAT3; II - gene enriched with binding site for c-Myc and n-Myc. The association score estimates distance between each pair of binding locus and gene based on genomic location of the binding locus that is closest to TSS.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data