|
Resolution: standard / high Figure 2.
Vocabulary fingerprint for FADS1 and its aliases. Schematic description of a group-specific
informative vocabulary automatically extracted from a text corpus of PubMed abstracts.
In this example, two “synonyms” (green arrows) and one “ambiguous” alias (red arrow)
of official gene symbol FADS1 (which encodes the enzyme fatty acid desaturase 1; blue
arrow) are distinguished by the algorithm when baseline cut-off was set at c = 0.05.
The internal control is the unrelated official gene symbol CLEC2B (black arrow). The
Jaccard distances to FADS1 are: 1) D5D = 0.937; 2) fatty acid desaturase 1 = 0.944;
3) TU12 = 1; CLEC2B = 1. Yellow boxes = words from the group-specific informative
vocabulary that occur in the text corpora of a given gene symbol or alias.
Coimbra et al. BMC Genomics 2010 11(Suppl 5):S3 doi:10.1186/1471-2164-11-S5-S3 |