Table 1

Top 25 words. The top 25 words for the bidirectional promoter set (a) and the unidirectional promoter set (b) of DNA-repair pathways. The words are sorted in descending order according to their statistical overrepresentation.

(a) Bidirectional


Word

S

ES

O

EO

Sln(S/ES)

RevComp

Position

Palindrome

P-Value


TCGCGCCA

4

0.918299

4

0.9375

5.88611

TGGCGCGA

12538

No

0.015391


TCCCGGGA

8

3.97165

8

4.26667

5.60208

TCCCGGGA

2

Yes

0.068606


GGCCCGCC

10

5.85012

11

6.5

5.36123

GGCGGGCC

21073

No

0.066821


TCCCGGCT

6

2.54354

6

2.66667

5.14921

AGCCGGGA

NA

No

0.054084


CAGGGGCC

4

1.1085

4

1.13514

5.13315

GGCCCCTG

14546

No

0.028413


AGGGCCGT

5

1.80245

5

1.86667

5.10145

ACGGCCCT

613

No

0.04142


TCTGAGGA

5

1.84222

6

1.90909

4.99234

TCCTCAGA

5391

No

0.013499


CGTGGGGG

5

1.86693

5

1.93548

4.92572

CCCCCACG

20402

No

0.047015


TGCTGAGA

4

1.17067

4

1.2

4.91487

TCTCAGCA

NA

No

0.033766


CGCGGCCG

4

1.17067

4

1.2

4.91487

CGGCCGCG

20259

No

0.033766


TCTGGGAT

2

0.180188

2

0.181818

4.8138

ATCCCAGA

2854

No

0.014655


GGGGCCGG

5

1.92725

5

2

4.76672

CCGGCCCC

20866

No

0.052648


AGGGAGGG

6

2.73111

6

2.87234

4.7223

CCCTCCCT

9852

No

0.07159


AGAAAAGA

3

0.632564

3

0.642857

4.66976

TCTTTTCT

NA

No

0.027559


CGACTCCG

3

0.632564

3

0.642857

4.66976

CGGAGTCG

NA

No

0.027559


GGGCCAGG

7

3.61284

7

3.85714

4.6299

CCTGGCCC

19875

No

0.096315


ACTCCAGC

5

2.02051

5

2.1

4.53045

GCTGGAGT

NA

No

0.062121


CGGGCCGA

5

2.05153

5

2.13333

4.45426

TCGGCCCG

6128

No

0.065478


TGCGGAAT

2

0.220092

2

0.222222

4.41371

ATTCCGCA

NA

No

0.021321


GCCCCTCC

8

4.63031

9

5.03226

4.37454

GGAGGGGC

7041

No

0.070206


GCCGGCGA

3

0.707627

3

0.72

4.33335

TCGCCGGC

20143

No

0.036618


TGAAGCCA

4

1.38876

4

1.42857

4.23154

TGGCTTCA

NA

No

0.056996


GGCAGGGA

6

3.01111

6

3.18182

4.1367

TCCCTGCC

10531

No

0.103337


TGCCCGCG

5

2.19845

5

2.29167

4.10844

CGCGGGCA

NA

No

0.082773


CAGCAGCC

6

3.02748

6

3.2

4.10418

GGCTGCTG

19198

No

0.105399


(b) Unidirectional


Word

S

ES

O

EO

Sln(S/ES)

RevComp

Position

Palindrome

P-Value


ACCCGCCT

4

0.716577

4

0.727273

6.87826

AGGCGGGT

19440

No

0.006562


CTTCTTTC

5

1.7686

5

1.81818

5.19624

GAAAGAAG

13567

No

0.037733


AGGAAACA

4

1.16659

4

1.19048

4.92885

TGTTTCCT

21667

No

0.032947


GCAGGGCG

6

2.75716

6

2.86957

4.66535

CGCCCTGC

1311

No

0.071337


GGGGCTGC

5

2.036

5

2.1

4.49226

GCAGCCCC

16359

No

0.062122


TCTTCTTC

4

1.30438

4

1.33333

4.48225

GAAGAAGA

NA

No

0.046491


GGGGAGTA

3

0.682407

3

0.692308

4.44222

TACTCCCC

17991

No

0.033211


ATTAAAAT

4

1.36853

4

1.4

4.29023

ATTTTAAT

16078

No

0.053723


CGGAAACC

3

0.750393

3

0.761905

4.15731

GGTTTCCG

NA

No

0.042101


TGGGCGGA

4

1.44679

4

1.48148

4.06778

TCCGCCCA

NA

No

0.063337


CGGCGGCG

3

0.787559

3

0.8

4.01229

CGCCGCCG

22091

No

0.047421


TTTTTTGA

3

0.787559

3

0.8

4.01229

TCAAAAAA

NA

No

0.047421


TTTCTCCA

4

1.48541

4

1.52174

3.96242

TGGAGAAA

2378

No

0.068398


AGCCGGCT

3

0.805285

3

0.818182

3.94551

AGCCGGCT

14

Yes

0.050071


CCTCTTTA

2

0.282982

2

0.285714

3.91104

TAAAGAGG

NA

No

0.033814


CGCCCCTT

6

3.12976

6

3.27273

3.90482

AAGGGGCG

21917

No

0.113859


GCGCCGCG

5

2.33164

5

2.41379

3.81433

CGCGGCGC

15062

No

0.097601


ATTCCCAG

3

0.843245

3

0.857143

3.80733

CTGGGAAT

21297

No

0.055985


TCTCCCCT

4

1.56036

4

1.6

3.7655

AGGGGAGA

18183

No

0.07881


TCCGCCGG

3

0.855341

3

0.869565

3.7646

CCGGCGGA

NA

No

0.057938


CTCCCGCT

3

0.867789

3

0.882353

3.72126

AGCGGGAG

NA

No

0.059981


TGCGCCGA

2

0.316812

2

0.32

3.68519

TCGGCGCA

3202

No

0.041483


GGGCGCCC

4

1.59514

4

1.63636

3.67732

GGGCGCCC

23

Yes

0.083901


GTGCGTTT

3

0.884961

3

0.9

3.66247

AAACGCAC

NA

No

0.062855


TTGGTCTC

4

1.60537

4

1.64706

3.65176

GAGACCAA

NA

No

0.085429


Lichtenberg et al. BMC Genomics 2009 10(Suppl 1):S18   doi:10.1186/1471-2164-10-S1-S18

Open Data