Table 10

Words not detected in the 5'UTRs

#WORD

E_S

E


GGAACTGC

5.1333

5.40909


GAGGACCC

5.02658

5.29661


GCCCTATA

5.015

5.2844


CCGTACCT

4.98236

5.25


GCGAGTAT

4.94491

5.21053


TATCGCAC

4.83088

5.09034


GGTTGCGG

4.69443

4.94652


GCGGAGTG

4.66421

4.91468


AGTACAGC

4.51745

4.76


GTGCCGAT

4.4368

4.675


GTCCTGGG

4.41572

4.65278


CGGCCGTG

4.3768

4.61176


GGTCGGGG

4.16843

4.39216


GTGCTGGG

4.13122

4.35294


TAGTGCAC

4.12843

4.35


TACCGGCC

4.08277

4.30189


GCCTACGC

4.03144

4.24779


CACCGCGG

3.94494

4.15663


GCGGCGTG

3.90217

4.11155


CGCCTTAG

3.77819

3.98089


CAGCCCAG

3.74709

3.94811


TGAACGGG

3.74703

3.94805


CGTACTGC

3.74638

3.94737


GTGCGCCG

3.68013

3.87755


AGTCCTGG

3.67692

3.87417


Top 25 words that were expected to occur in the 5'UTR but are not part of the sequences. Each word is identified through is nucleotide sequence and contains information about the expected number of sequences it was computed to occur in (E_S) as well as the expected number of total occurrences in the set of sequences (E). The words are sorted by their expected sequence occurrence.

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data