Table 6

The top 25 words in Proximal Promoters

Unmasked

Masked

Unmasked


Word

S

ES

O

EO

SlnSES

S

ES

O

EO

SlnSES

RevComp

RC_Pos

Pal

PValues


TAAAAAAT

4249

3411.11

4837

3674.74

933.272

3681

3028.65

4071

3237.18

718.039

ATTTTTTA

1

No

0


ATTTTTTA

3876

3135.31

4372

3358.5

822.011

3313

2758.58

3636

2932.38

606.738

TAAAAAAT

0

No

2.22E-16


TTATATAA

3094

2505.92

3390

2650.31

652.239

2712

2508.38

2934

2653.02

211.674

TTATATAA

2

Yes

7.77E-16


AATATATT

3636

3104.08

4093

3322.92

575.097

3178

3009.54

3503

3215.49

173.09

AATATATT

3

Yes

1.67E-15


GAAAAAAG

2066

1652.5

2182

1718.49

461.395

1956

1621.19

2053

1684.9

367.226

CTTTTTTC

5

No

1.11E-16


CTTTTTTC

1960

1578.31

2072

1638.97

424.512

1869

1559.58

1969

1618.92

338.269

GAAAAAAG

4

No

1.11E-16


AAAAATTG

2975

2595.17

3208

2749.61

406.363

2737

2368.41

2938

2497.98

395.888

CAATTTTT

9

No

-6.66E-16


TAAAATTT

4339

3951.48

5058

4305.15

405.93

3764

3348.9

4214

3603.07

439.821

AAATTTTA

10

No

-6.66E-16


TAATTTTT

4656

4272.02

5336

4686.12

400.739

4125

3726.41

4609

4040.78

419.188

AAAAATTA

19

No

0


CAATTTTT

2872

2499.79

3110

2643.5

398.638

2633

2269.83

2829

2389.32

390.785

AAAAATTG

6

No

6.66E-16


AAATTTTA

4239

3880.57

4921

4221.59

374.5

3651

3305.77

4102

3553.5

362.665

TAAAATTT

7

No

8.88E-16


TACAAAAT

2589

2241.1

2821

2357.73

373.61

2344

2040.96

2514

2138.69

324.496

ATTTTGTA

26

No

6.66E-16


ATTTTCTA

2206

1886.09

2346

1970.39

345.622

2022

1748.93

2142

1822.19

293.357

TAGAAAAT

17

No

8.88E-16


TGAAAAAT

2374

2075.6

2517

2176.47

318.891

2230

1927.32

2354

2015.09

325.288

ATTTTTCA

21

No

5.64E-13


AAAAAATC

3874

3607.85

4265

3902.57

275.738

3494

3280.06

3823

3524

220.77

GATTTTTT

68

No

5.63E-09


CATTTTTC

1675

1426.93

1760

1477.44

268.478

1558

1356.8

1624

1402.92

215.428

GAAAAATG

29

No

5.16E-13


TAAGAAAT

1895

1645.36

1990

1710.83

267.683

1773

1553.49

1856

1612.42

234.336

ATTTCTTA

23

No

2.52E-11


TAGAAAAT

2154

1904.65

2281

1990.5

265.005

1971

1754.61

2083

1828.31

229.215

ATTTTCTA

12

No

1.04E-10


GGAAAAAA

2679

2426.86

2853

2562.63

264.801

2506

2238.07

2643

2354.4

283.363

TTTTTTCC

98

No

9.20E-09


AAAAATTA

4735

4477.84

5547

4933.58

264.404

4109

3862.67

4667

4200.51

254.025

TAATTTTT

8

No

1.33E-15


CAAAATTT

3347

3092.9

3655

3310.2

264.267

3054

2796.42

3304

2974.88

269.093

AAATTTTG

60

No

1.95E-09


ATTTTTCA

2338

2088.5

2489

2190.56

263.846

2169

1928.62

2295

2016.5

254.769

TGAAAAAT

13

No

2.29E-10


TTTTTTGG

3369

3120.79

3724

3341.96

257.829

3050

2802.67

3330

2981.91

257.935

CCAAAAAA

28

No

4.49E-11


ATTTCTTA

1947

1705.79

2052

1775.75

257.518

1800

1598.57

1900

1660.66

213.623

TAAGAAAT

16

No

8.37E-11


Top 25 overrepresented words for the proximal promoters in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).

Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).

Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data