Table 16

Co-occurrence in Core Promoters

Word1

Word2

S

ES

S*ln(S/ES)


GCCCAATA

GCCCATTA

32

2.3492

83.5729


TTTTTTCT

TTTTTCTT

68

22.9531

73.8516


AATAAAAA

AAGAAAAA

84

41.5798

59.069


CTCTCTTT

CTTTCTCT

40

9.1626

58.95


AATAAAAA

ATTAAAAA

57

22.4453

53.1222


ACAAAAAA

AAGAAAAA

71

35.1265

49.9645


ACAAAAAA

AGAAAAAA

66

31.1075

49.6455


ATTTCTCA

TATAAATA

30

6.1031

47.772


AATAAAAA

TAAAAAAT

38

10.8748

47.5432


AAAAAACA

ACAAAAAA

56

24.4921

46.3121


AAAAATAT

AAAAAACA

44

15.5191

45.8533


AACAAAAA

AAGAAAAA

77

42.5433

45.6828


AACAAAAA

AGAAAAAA

69

37.6758

41.7512


TTTCTTTT

TTTTTTGT

40

14.2927

41.1653


AAAAAACA

ATATAAAG

30

7.659

40.9596


AAAAAACA

CTATATAA

36

11.9538

39.689


AAAAATAT

CTATATAA

30

8.0863

39.3309


TATATAAA

TAAAAAAT

36

12.3623

38.4793


AATAAAAA

TTAAAAAA

53

25.8324

38.0892


TTTTATTT

TTTTTTAA

38

14.0039

37.9336


TTTTATTT

TTTTTCTT

50

23.5743

37.5932


TTCTTTTT

TTTTTCTT

46

20.3942

37.416


AAATTAAA

ACAAAAAA

44

18.9721

37.0137


AATAAAAA

AGAAAAAA

65

36.8225

36.938


TTTCTTTT

TTTTTGTT

41

16.8429

36.4755


Overrepresented non-overlapping word-pairs detected in the core promoters of Arabidopsis thaliana. A word-pair is characterized through the two nucleotide sequences associated with it (Word1 and Word2), the number of sequences the pair occurs in (S) as well as the expected number of sequences (ES) and a statistical score symbolizing the overrepresentation of the word-pair in the specific sequence set (S*ln(S/ES)).

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data