Table 15

Co-occurrence in Introns

Word1

Word2

S

ES

S*ln(S/ES)


TTTTATTT

ATTTTTTA

393

217.8144

231.9354


TTTTTATT

ATTTTTTA

334

186.0726

195.3914


TAAAAAAT

AATATATT

147

39.3119

193.8792


TTTTTAAT

TTTTTATT

460

306.2869

187.084


TAAAAAAT

TTTTATTT

273

140.3538

181.6284


TAATTTTT

ATTTTTTA

238

113.2939

176.6639


CTCTGTTT

CTGTTTTT

346

208.3136

175.5583


TTTTATTT

AATATATT

308

175.8151

172.6854


TTTTATTT

TTTTTAAT

505

358.7745

172.6415


TAAAAAAT

ATTTTTTA

149

48.6332

166.8264


TAAAAAAT

TTTTTAAT

189

79.759

163.0573


TAAAAAAT

TAATTTTT

179

73.1119

160.2756


TTTTATTT

TAATTTTT

461

328.5857

156.0948


TTTTTAAT

ATTTTTTA

238

123.6151

155.9133


TAAAAAAT

TTTTTCTT

305

185.7949

151.1788


TAAAAAAT

TTTTTATT

230

119.9486

149.7338


TTTTTATT

AATATATT

261

150.2261

144.1709


TAATTTTT

TTTTTAAT

300

186.1617

143.1501


TTTTTAAT

AATATATT

202

99.8493

142.3303


TTTTATTT

TTTTTATT

670

542.1648

141.8441


TAAAAAAT

TTTTTTGT

262

157.163

133.898


TAATTTTT

AATATATT

187

91.5206

133.6198


ATTTTTTA

TTTTTTGT

354

243.9756

131.769


TAAAAAAT

TTTTGTTT

357

246.9371

131.5909


TTTTTAAT

TTTTTGTT

638

519.9558

130.5312


Overrepresented non-overlapping word-pairs detected in the introns of Arabidopsis thaliana. A word-pair is characterized through the two nucleotide sequences associated with it (Word1 and Word2), the number of sequences the pair occurs in (S) as well as the expected number of sequences (ES) and a statistical score symbolizing the overrepresentation of the word-pair in the specific sequence set (S*ln(S/ES)).

Lichtenberg et al. BMC Genomics 2009 10:463   doi:10.1186/1471-2164-10-463

Open Data