EntrezGene official symbols with PubMed abstracts and their aliases classified by the algorithm. Description of data: 73 randomly chosen official gene symbols that produced text corpora of PubMed abstracts and their aliases. Aliases were classified by the algorithm as “synonyms”, “ambiguous”, “aliases with PubMed abstract but not passing the filters”, or “aliases without PubMed abstracts”.

