Table 11

Words not detected in the Introns

#WORD

E_S

E


CGCGGACA

6.1805

6.4557


CCCGGGAG

4.57278

4.77632


CCGGCCCC

4.46781

4.66667


CGCCCCCC

4.45254

4.65072


GCCCACCG

4.16782

4.35331


GCCGCGGG

3.47686

3.63158


CCGAGGGG

3.34433

3.49315


AAGCGCCC

3.17737

3.31875


CGCCAGCG

2.99188

3.125


CGCTCGCG

2.91507

3.04478


GCGTCGCG

2.8245

2.95017


CCGGCACG

2.48216

2.59259


CCGGGGCG

2.25483

2.35514


CCCGCGCC

2.16189

2.25806


TCGGGCGC

2.11021

2.20408


GCGCACGG

2.02051

2.11039


CGCTCCGC

2.00514

2.09434


CGCGACGC

1.99945

2.0884


TGCGCCCG

1.9539

2.04082


GGTGCGCG

1.92911

2.01493


GCGGGCCC

1.90464

1.98936


CGCGGCGA

1.86163

1.94444


GCGCGACG

1.83299

1.91453


GGGCGGGC

1.79662

1.87654


CCGCCGGG

1.73887

1.81622


Top 25 words that were expected to occur in the introns but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data