Table 4

Top 2 clusters for the unidirectional promoter. The word-based clusters for the two most overrepresented words for the bidirectional promoters. Rank 1 refers to word ACCCGCCT and Rank 2 to CTTCTTTC.

(a) Rank 1


Word

S

ES

O

EO

Sln(S/ES)

RevComp.

Position

Palindrome


ACCCGCCT

4

0.716577

4

0.727273

6.87826

AGGCGGGT

19440

No


ATCCGCCT

1

0.132296

1

0.133333

2.02271

AGGCGGAT

NA

No


ACCAGCCT

2

0.738772

2

0.75

1.99183

AGGCTGGT

1303

No


AGCCGCCT

1

0.657331

1

0.666667

0.419567

AGGCGGCT

1056

No


ACCCACCT

1

0.738772

1

0.75

0.302766

AGGTGGGT

NA

No


ACGCGCCT

1

1.16147

1

1.18519

-0.14969

AGGCGCGT

NA

No


CCCCGCCT

1

2.45503

2

2.54545

-0.89814

AGGCGGGG

21912

No


(b) Rank 2


Word

S

ES

O

EO

Sln(S/ES)

RevComp.

Position

Palindrome


CTTCTTTC

5

1.7686

5

1.81818

5.19624

GAAAGAAG

13567

No


CTACTTTC

1

0.180301

1

0.181818

1.71313

GAAAGTAG

NA

No


CTTCTTCC

1

0.304671

1

0.307692

1.18852

GGAAGAAG

5306

No


CTGCTTTC

2

1.15305

2

1.17647

1.10147

GAAAGCAG

9703

No


CGTCTTTC

1

0.371023

1

0.375

0.991491

GAAAGACG

20167

No


CTCCTTTC

3

2.36561

3

2.45

0.712729

GAAAGGAG

11346

No


CTTCTATC

1

0.607134

1

0.615385

0.499005

GATAGAAG

NA

No


CTTCCTTC

1

0.921427

1

0.9375

0.0818318

GAAGGAAG

10908

No


GTTCTTTC

1

1.07027

1

1.09091

-0.067912

GAAAGAAC

17502

No


CTTTTTTC

1

1.2055

1

1.23077

-0.186894

GAAAAAAG

NA

No


TTTCTTTC

2

3.4628

2

3.63636

-1.09786

GAAAGAAA

NA

No


Lichtenberg et al. BMC Genomics 2009 10(Suppl 1):S18   doi:10.1186/1471-2164-10-S1-S18

Open Data