Table 12

Observation of keyword associations

#

unigram

bigram

trigram

T1

..

Tn

Tn+1

..

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M53">View MathML</a>

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M54">View MathML</a>

..

<a onClick="popup('http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/13/S9/S2/mathml/M55">View MathML</a>

k1

..

kn

k1k2

..

kn-1kn

k1k2k3

..

kn-2kn-1kn


1

1

..

1

0

..

1

0

..

1

2

1

..

1

1

..

1

1

..

1

..

.

.

.

.

.

.

N

0

..

0

0

..

1

0

..

1


The keyword associations are observed: (1) 1-keyword subsequences are unigrams, 2-keyword subsequences are bigrams, 3-keyword subsequences are trigrams and so on; (2) all unigrams, bigrams, trigrams and ngrams are defined as terms; (3) a passage scores 1 if a term is appeared in it, otherwise it scores 0; (4) a passage is represented as a 1-0 vector.

Hu et al. BMC Bioinformatics 2012 13(Suppl 9):S2   doi:10.1186/1471-2105-13-S9-S2

Open Data