Table 7

The top 25 words in Distal Promoters

Unmasked

Masked

Unmasked


Word

S

ES

O

EO

SlnSES

S

ES

O

EO

SlnSES

RevComp

RC_Pos

Pal

PValues


ATTTTTTA

5789

4874.02

7202

5393.37

995.937

4920

4189.9

5773

4568.53

790.309

TAAAAAAT

1

No

6.66E-16


TAAAAAAT

5865

4983.57

7314

5527.8

955.154

5003

4269.17

5877

4662.83

793.568

ATTTTTTA

0

No

6.66E-16


GAAAAAAG

3578

2825.77

3921

2995.09

844.484

3394

2744.34

3697

2903.99

721.112

CTTTTTTC

3

No

8.88E-16


CTTTTTTC

3546

2878.92

3904

3054.71

739.005

3345

2798.31

3662

2964.33

596.918

GAAAAAAG

2

No

0


TTATATAA

4781

4107.17

5656

4470.46

726.305

4138

3955.09

4717

4291.1

187.08

TTATATAA

4

Yes

0


AATATATT

5432

4895.21

6702

5419.31

565.205

4688

4574.65

5538

5029.33

114.742

AATATATT

5

Yes

0


CAAGAAAC

2910

2459.44

3187

2587.64

489.513

2818

2410.32

3089

2533.47

440.364

GTTTCTTG

7

No

-4.44E-16


GTTTCTTG

2912

2482.93

3182

2613.58

464.176

2842

2430.36

3108

2555.55

444.685

CAAGAAAC

6

No

0


GAAAAATG

3158

2736.51

3416

2895.24

452.402

2871

2566.09

3080

2705.63

322.343

CATTTTTC

29

No

0


GTTTTTGA

3516

3093.27

3830

3296.52

450.382

3207

2816.69

3462

2984.91

416.186

TCAAAAAC

13

No

8.88E-16


GAAAAAAC

3013

2605.34

3240

2749.19

438.004

2744

2495.22

2935

2627.17

260.786

GTTTTTTC

26

No

5.55E-16


CAATTTTT

4457

4041.77

4991

4393.18

435.864

4009

3601.54

4440

3878.67

429.685

AAAAATTG

25

No

1.67E-15


ATTTTGTA

4098

3689.96

4626

3981.23

429.814

3735

3342.23

4123

3580.11

414.995

TACAAAAT

69

No

1.55E-15


TCAAAAAC

3414

3011.29

3688

3203.78

428.513

3129

2749.95

3358

2910.25

404.054

GTTTTTGA

9

No

7.77E-16


GAAGAAAG

3851

3448.5

4291

3702.07

425.126

3664

3290.44

4048

3520.87

394.006

CTTTCTTC

59

No

1.11E-16


GTTTTATG

2173

1793.07

2293

1861.81

417.607

2048

1720.91

2156

1784.36

356.372

CATAAAAC

57

No

1.11E-16


CTTTATTC

1618

1250.45

1676

1284.79

416.937

1500

1215.7

1548

1248.25

315.217

GAATAAAG

43

No

4.44E-16


GTTTTAAG

1957

1584.64

2054

1638.71

413.031

1791

1482.73

1871

1530.29

338.304

CTTAAAAC

28

No

1.33E-15


ATTTTTCA

4081

3695.36

4496

3987.5

405.1

3743

3364

4095

3605.05

399.585

TGAAAAAT

40

No

6.66E-16


TAAGAAGT

1465

1112.41

1517

1139.93

403.359

1388

1100.56

1435

1127.54

322.073

ACTTCTTA

62

No

-8.88E-16


CTTGTTTC

2351

1980.52

2504

2064.03

403.153

2269

1929.76

2415

2009.12

367.453

GAAACAAG

35

No

0


CAAAAAAG

3391

3011.99

3696

3204.57

401.915

3126

2864.52

3392

3038.54

273.068

CTTTTTTG

88

No

0


TAGAAAAT

3556

3178.38

3887

3393.13

399.217

3219

2901.76

3488

3080.38

333.981

ATTTTCTA

41

No

0


ATTCTTCA

2716

2348.17

2896

2465.08

395.248

2529

2255.7

2691

2363.65

289.221

TGAAGAAT

31

No

1.11E-16


Top 25 overrepresented words for the distal promoters in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).

Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).

Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data