Table 5

The top 25 words in Core Promoters

Unmasked

Masked

Unmasked


Word

S

ES

O

EO

SlnSES

S

ES

O

EO

SlnSES

RevComp

RC_Pos

Pal

PValues


TATAAATA

1355

1071.69

1369

1175.57

317.831

1300

1029.92

1311

1128.85

302.753

TATTTATA

69

No

2.02E-08


CTATAAAT

712

474.27

716

514.446

289.286

704

464.711

708

503.987

292.416

ATTTATAG

2504

No

7.77E-16


CTATATAA

636

410.261

638

444.486

278.826

626

450.579

628

488.533

205.839

TTATATAG

18530

No

1.11E-16


ATATAAAC

560

350.797

560

379.643

261.928

554

347.685

554

376.253

258.091

GTTTATAT

26957

No

4.44E-16


TAAAAAAT

473

295.342

480

319.301

222.765

453

298.58

460

322.82

188.835

ATTTTTTA

12

No

-2.22E-16


ATATATAC

544

394.869

559

427.688

174.295

507

330.093

515

357.099

217.573

GTATATAT

5651

No

7.41E-10


AATATATT

300

181.346

300

195.646

151.012

287

195.452

287

210.918

110.256

AATATATT

6

Yes

2.74E-12


TTATATAA

524

397.031

529

430.047

145.398

514

430.79

518

466.905

90.7739

TTATATAA

7

Yes

2.22E-06


AAGAAAAA

1261

1129.24

1318

1240.05

139.165

1189

1063

1238

1165.84

133.189

TTTTTCTT

25

No

0.014544


ATATAAAG

378

262.861

380

284.014

137.316

375

261.181

377

282.19

135.643

CTTTATAT

377

No

3.41E-08


TATATAAA

1260

1131.11

1276

1242.15

135.966

1234

1102.41

1250

1209.97

139.143

TTTATATA

1458

No

0.171817


AGAAAAAA

1127

1000.04

1170

1095.49

134.693

1063

936.863

1099

1025.06

134.271

TTTTTTCT

31

No

0.01331


ATTTTTTA

312

204.097

315

220.282

132.415

299

207.163

302

223.604

109.715

TAAAAAAT

4

No

1.17E-09


TTTTAAAA

688

568.245

696

617.46

131.571

658

543.865

665

590.7

125.351

TTTTAAAA

13

Yes

0.001019


CTCTTCTC

402

294.202

429

318.061

125.499

371

277.661

390

300.087

107.516

GAGAAGAG

444

No

1.97E-09


ACAAAAAA

958

840.585

988

918.052

125.259

917

799.552

939

872.564

125.681

TTTTTTGT

45

No

0.011607


ATAAATAC

578

466.039

582

505.44

124.446

574

459.992

578

498.825

127.095

GTATTTAT

14072

No

0.000465


TTATAAAA

507

397.553

508

430.617

123.294

490

386.47

491

418.525

116.302

TTTTATAA

945

No

0.000153


AAATTAAA

718

609.913

745

663.251

117.144

682

578.03

705

628.206

112.806

TTTAATTT

96

No

0.000967


GCCCATTA

374

273.89

396

295.991

116.512

372

272.658

394

294.653

115.571

TAATGGGC

190

No

1.82E-08


AAAAAACA

893

787.368

924

859.073

112.42

849

736.927

874

803.277

120.193

TGTTTTTT

33

No

0.014723


TTAAAAAA

805

701.565

828

764.227

110.71

768

667.112

788

726.227

108.159

TTTTTTAA

27

No

0.01177


ATTAAAAA

708

609.58

719

662.885

105.969

671

581.412

681

631.921

96.1611

TTTTTAAT

316

No

0.016276


GCCCAATA

322

231.782

340

250.291

105.859

321

228.286

337

246.5

109.41

TATTGGGC

130

No

4.26E-08


Top 25 overrepresented words for the core promoter regions in Arabidopsis thaliana. The Word attribute describes the short nucleotide sequence associated with a putative word. S and ES describe the number of sequences a word occurs in and the number of sequences the word was expected to occur in respectively, while O and EO describe the total number of occurrences and the expected total number of occurrences. The score SlnSES describes a statistical coverage of the sequences analyzed in the set and is based on a Markov Chain Background Model. Each set of attributes was computed for the masked as well as the unmasked version of the corresponding segment with the emphasis placed on the unmasked version (i.e. sorting of the table based on the unmasked SlnSES score).

Further information for the word is provided through its reverse complement (RevComp) and the position of the reverse complement in the set of results (RC_Pos) as well as a notion describing if the word is a genomic palindrome (Pal).

Finally, PValues describes a p-value that is assigned in order to provide statistical insight allowing the determination if a word is relevant or was discovered as interesting by random chance.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data