Table 5

Edit cluster for bidirectional promoters. The word-based clusters for the two most overrepresented words for the bidirectional promoters according to the edit distance metric. Rank 1 refers to word TCGCGCCA and Rank 2 to TCCCGGGA.

(a) Rank 1


Word

S

ES

O

EO

Sln(S/ES)

RevComp.

Position

Palindrome


TCGCGCCA

4

0.918299

4

0.9375

5.88611

TGGCGCGA

12538

No


TCGCCCCA

3

0.805161

3

0.820513

3.94598

TGGGGCGA

2834

No


TAGCTCCA

2

0.352982

2

0.357143

3.46897

TGGAGCTA

NA

No


TCTCGCGA

2

0.438673

2

0.444444

3.0343

TCGCGAGA

4937

No


TCGCCACA

2

0.455424

2

0.461538

2.95935

TGTGGCGA

4669

No


...


(b) Rank 2


Word

S

ES

O

EO

Sln(S/ES)

RevComp.

Position

Palindrome


TCCCGGGA

8

3.97165

8

4.26667

5.60208

TCCCGGGA

2

Yes


TCCCGGCT

6

2.54354

6

2.66667

5.14921

AGCCGGGA

NA

No


ATCCGGGA

2

0.395077

2

0.4

3.24364

TCCCGGAT

NA

No


TCTCGCGA

2

0.438673

2

0.444444

3.0343

TCGCGAGA

4937

No


TTCCTGGA

2

0.493082

2

0.5

2.80045

TCCAGGAA

9505

No


...


Lichtenberg et al. BMC Genomics 2009 10(Suppl 1):S18   doi:10.1186/1471-2164-10-S1-S18

Open Data