Table 12

Words not detected in the Core Promoters

#WORD

E_S

E


CGCACACC

5.86109

6.3029


GTCCGAAC

5.46787

5.88


GCCCTATG

5.23895

5.6338


GGACGTCG

4.98873

5.36471


GGCCCTAG

4.47129

4.80822


CGCGAGCG

4.35999

4.68852


GATCCCCC

3.92081

4.21622


GGCCGCAT

3.82028

4.10811


TACCCAGG

3.80429

4.09091


GGCCCCTG

3.67267

3.94937


CGCATCCG

3.66922

3.94565


CACGCCGA

3.56933

3.83824


CCGGCCGC

3.51312

3.77778


CGCGGTCA

3.51079

3.77528


AGGGCCCT

3.50922

3.77358


GGCGCTGT

3.49296

3.7561


ACGCCCTG

3.45587

3.71622


GCGGACAC

3.30648

3.55556


AGTGGCGC

3.29952

3.54808


GGGCGTTC

3.26995

3.51628


CGCGCAAG

3.25481

3.5


ACCCGCGT

3.22635

3.46939


TTACCCCG

3.22482

3.46774


CCGGTGCG

3.18249

3.42222


TAGGGCCG

3.18249

3.42222


Top 25 words that were expected to occur in the core promoters but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data