Table 2

Distribution of transcription factor binding sites across mosaic classes

Data Source

Factor

P

Sites

Pair 2

Pair 7

Pair 9

Pair 14

Total

Note 1

Note 2

Note 3

Note 4

Note 5

Note 5

Note 5

Note 5

Note 6


HAIB-K562

GABP

0.71

2557

0.054

0.035

0.113

0.774

0.976

HAIB-K562

NRSF

0.74

2006

0.236

0.231

0.254

0.142

0.862

HAIB-K562

SRF

0.64

367

0.370

0.083

0.111

0.229

0.794

YALE-GM128

NFKB

0.50

2653

0.322

0.156

0.139

0.069

0.686

YALE-HCT116

TCF7L2

0.50

3386

0.281

0.111

0.060

0.030

0.483

YALE-HepG2

SREBP1

0.50

4958

0.237

0.092

0.137

0.276

0.742

YALE-K562b

GATA1

0.52

3367

0.322

0.221

0.146

0.048

0.736

YALE-K562b

TR4

0.51

541

0.144

0.083

0.216

0.426

0.870

YALE-K562b

ZNF263

0.64

5098

0.049

0.194

0.466

0.147

0.856

YALE-K562

cFos

0.53

3746

0.287

0.186

0.111

0.018

0.603

YALE-K562

Max

0.60

3176

0.210

0.100

0.180

0.185

0.675

YALE-K562

NF-E2

0.81

4700

0.273

0.149

0.088

0.026

0.536

YALE-NT2D1

YY1

0.50

2967

0.252

0.135

0.157

0.333

0.876

YALE-K562-Ia30

STAT1

0.50

1039

0.398

0.104

0.077

0.059

0.638

ORegAnno

CTCF

1.00

4858

0.202

0.181

0.353

0.169

0.905

TRANSFAC

sp1

0.62

693

0.045

0.075

0.332

0.512

0.966

TRANSFAC

p53

0.91

608

0.266

0.203

0.081

0.021

0.571


Average of above

0.232

0.138

0.178

0.204

0.751


Genome proportions

0.134

0.071

0.042

0.009

0.256

Model proportions

0.134

0.069

0.042

0.009

0.255

Promoter region

0.170

0.069

0.128

0.233

0.600


1) For Encode data, this column gives the track, cell line and a possible note on the experimental protocol. For the HAIB data replication 1 was used. 2) Name of factor. 3) This column shows, P, the proportion of sequences for which MAST found a binding site. 4) The number sites found by MAST: later columns show the proportion of these sites in the class pair specified. 5) The values quoted are the average probabilities of the site being in the class(es); they are not maximum likelihood estimates. 6) Total of preceding columns. 7) Three lines of comparative figures are given. The line "genome proportions" gives the result of applying the analysis to 20 thousand bases chosen at random from the genome: the line "model proportions" gives those of the long term average of the HMM: the line "promoter region" gives the proportions found from applying the model to the bases within 1000 bases upstream of the transcription start site of all coding genes. The equality of "genome proportions" and "model proportions" is a cross check on the consistency of the calculations.

Evans BMC Genomics 2010 11:286   doi:10.1186/1471-2164-11-286

Open Data